Join our newsletter

#noSpamWePromise
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
cross-icon
Subscribe

Run your data operations on a single, unified platform.

  • Easy setup, no data storage required
  • Free forever for core features
  • Simple expansion with additional credits
cross-icon
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Download the file

#getsmarter
Oops! Something went wrong while submitting the form.
cross-icon
How To
September 24, 2021
The 7 critical differences between a star schema and a snowflake schema
Learn the major star schema vs snowflake schema database differences and choose the right for your use case.

Star schemas and snowflake schemas are the two predominant types of data warehouse schemas. 

A data warehouse schema refers to the shape your data takes - how you structure your tables and their mutual relationships within a database or data warehouse.

Since the primary purpose of a data warehouse (and other Online Analytical Processing (OLAP) databases) is to provide a centralized view of all the enterprise data for analytics, data warehouse schemas help us achieve superior analytic results. 

How do schemas help analytics? And what are the differences and trade-offs between star and snowflake schemas?

In this article, we compare the two dominant variants of data warehouse schemas and contrast their advantages and disadvantages.

Run a 100% data-driven business without any extra hassle.
Pay as you go, starting with our free tier.

What is a star schema?

To understand the data modeling behind a star schema, let us look at a retail example. Imagine you are running an international shopping brand and you want to analyze purchases across your physical locations.

You pull out data from your database as an Excel file:

database example


But you soon realize there are too many rows, and the data needs to be cleaned before you can analyze it.

You decide to turn the data into a star schema.

A star schema is a data model that stores information in multiple table types: a single fact table and multiple dimensional tables. 

In contrast to the classical database design of normalizing tables, star schemas connect dimensional data with fact data in a shape resembling a star (hence the name), as can be seen from the following diagram:

star schema example


Run a 100% data-driven business without any extra hassle.
Pay as you go, starting with our free tier.

In the diagram, we see a central fact table (holding all the facts of the sales) and four dimension tables - a separate table describing the customer, date (of purchase), store where the purchase happened, and product purchased. 

The fact table is linked via a foreign key relationship to the primary key of each dimension (aka, the id in each dimension table, for example, the customer_id links the customer from the dim_customer table to the fact_sales table). 

This type of data modeling allows us to query data faster and with simpler queries than the normalized database design.

What is a snowflake schema?

A snowflake schema is very similar to the simple star schema above. The main difference is that snowflake schemas split dimensional tables into further dimensional tables (also called lookup tables). 

For example, the above diagram would show the customer_country field being split into further dimensional tables:

snowflake schema


Each dimension is split until it is normalized - aka, there is no redundancy in the dimensional table, no repetition of values (except for identifier values, such as id’s).

The 7 critical differences between a star schema and a snowflake schema

1. Normalization of dimension tables

The snowflake schema is a fully normalized data structure. Dimensional hierarchies (such as city > country > region) are stored in separate dimensional tables. 

On the other hand, star schema dimensions are denormalized. Denormalization refers to the repeating of the same values within a table. 

2. Data redundancy

Star schema stores redundant data in dimension tables, while snowflake schema fully normalizes dimension tables and avoids data redundancy.

For example, a star schema would repeat the values in field customer_address_country for each order from the same country. 

The redundancy, or duplicated entries, occurs because of the denormalization vs normalization schema design.

3. Query complexity

A simple star schema leads to simple query writing. Because the fact table is joined to only one level of dimensional tables, analysts do not need to write multiple joins.

On the other hand, snowflake schemas require a more complex query design. Because of complex relationships between the fact table and its dimensional tables, more joins are needed to link the additional tables. This causes an additional overhead when writing analytical queries.

4. Query performance

The query execution time is faster in star schemas. Because they require a single join between a fact and its set of attributes in dimensional tables, a star schema acts almost as a single table for query lookups.

In contrast, snowflake schemas require complex joins of dimensional tables with their own sub-dimensional or supra-dimensional tables. This slows down query processing and can affect other OLAP products such as cube processing. 

5. Disk space

Star schemas might run queries faster, but they require more storage space than snowflake schemas because of their data redundancy. 

6. Data integrity

Data integrity is more at risk in star schemas than snowflake schemas. Because data is stored redundantly, multiple copies of the same data exist in the star schema’s dimensional tables. This means new inserts, updates, or deletes can compromise the integrity of data.

In contrast, the snowflake schema is less prone to data integrity issues, because it fully normalizes dimensional tables, storing dimension data only once in the appropriate table.

7. Set up and maintenance 

Star schemas are easier to design and set up. Because they are represented by simple relationships, it is easy for a data engineer or data architect to set up an appropriate star schema.

On the other hand, star schemas are harder to maintain than snowflake schemas. As new data is ingested into the data warehouse, star schemas become harder to maintain and check against data integrity violations. 

Which one should I pick?

Out of the two types of data warehouse schema, which one should you choose?

On one hand, star schemas are simpler, run queries faster, and are easier to set up.

On the other hand, snowflake schemas are less prone to data integrity issues, are easier to maintain, and utilize less space.

Based on the tradeoffs above, it depends on which advantage (or disadvantage) best suits your business use cases.

Keboola can help you model your data faster

Keboola is an end-to-end data operation tool that helps you automate your end-to-end data pipelines.

Designing the right data warehouse schema is hard enough.

Lessen the burden on your engineers by automating all the data extraction, transformation, and loading into databases and data warehouses of your choice that follow schema design.

Try it out.

Keboola offers a no-questions-asked, always-free tier, so you can play around and build your pipelines with a couple of clicks. 

Recomended Articles