Snowflake Schema
A snowflake schema is a data warehouse design pattern extending star schemas by normalizing dimension tables into multiple related tables, reducing redundancy at the cost of additional joins.
Star schemas denormalize dimensions to simplify querying, storing attributes in single-level tables. Snowflake schemas apply normalization, breaking dimensions into related tables. A product dimension in a star schema contains product_id, name, category, and category_description. In a snowflake schema, product and category are separate tables; products reference categories through foreign keys.
This normalization reduces storage by eliminating redundant category descriptions. When a category description changes, snowflake updates one row; star schema updates thousands of product rows. However, snowflake adds complexity: queries require additional joins (product to category), and the schema becomes less intuitive for business users unfamiliar with normalization.
Snowflake schemas are less common than star schemas in modern data warehousing. Cloud storage is inexpensive, making denormalization cost-effective. Query engines efficiently handle additional joins. Business users and BI tools expect the simplicity of star schemas. Snowflake schemas remain useful in storage-constrained environments or when data volumes are enormous, making redundancy elimination significant.
Key Characteristics
- ▶Normalize dimension tables into multiple related tables
- ▶Reduce data redundancy through normalization
- ▶Require additional joins to reconstruct dimension attributes
- ▶More complex schema structure than star schemas
- ▶Save storage at cost of query complexity
- ▶Support hierarchical dimension relationships
Why It Matters
- ▶Reduce storage requirements for large dimensional hierarchies
- ▶Simplify dimension maintenance by updating one table per change
- ▶Support hierarchical dimensional relationships explicitly
- ▶Enable flexible drill-down analysis through normalized hierarchies
- ▶Reduce data inconsistency through single-source-of-truth dimensions
- ▶Optimize for environments with strict storage constraints
Example
` -- Star schema (denormalized) vs Snowflake (normalized) -- STAR SCHEMA (single-level dimensions) CREATE TABLE dim_product_star ( product_id INT PRIMARY KEY, product_name VARCHAR(255), category VARCHAR(100), category_description TEXT, subcategory VARCHAR(100), subcategory_description TEXT, department VARCHAR(50), supplier_id INT ); -- SNOWFLAKE SCHEMA (normalized dimensions) CREATE TABLE dim_category ( category_id INT PRIMARY KEY, category_name VARCHAR(100), category_description TEXT ); CREATE TABLE dim_subcategory ( subcategory_id INT PRIMARY KEY, subcategory_name VARCHAR(100), category_id INT REFERENCES dim_category, subcategory_description TEXT ); CREATE TABLE dim_department ( department_id INT PRIMARY KEY, department_name VARCHAR(50) ); CREATE TABLE dim_product_snowflake ( product_id INT PRIMARY KEY, product_name VARCHAR(255), subcategory_id INT REFERENCES dim_subcategory, department_id INT REFERENCES dim_department, supplier_id INT ); -- Query comparison: -- Star schema query (fewer joins) SELECT p.category, p.department, SUM(f.revenue) as total_revenue FROM fct_sales f JOIN dim_product_star p ON f.product_id = p.product_id GROUP BY p.category, p.department; -- Snowflake schema query (more joins) SELECT c.category_name, d.department_name, SUM(f.revenue) as total_revenue FROM fct_sales f JOIN dim_product_snowflake p ON f.product_id = p.product_id JOIN dim_subcategory s ON p.subcategory_id = s.subcategory_id JOIN dim_category c ON s.category_id = c.category_id JOIN dim_department d ON p.department_id = d.department_id GROUP BY c.category_name, d.department_name; `
Coginiti Perspective
Coginiti's SMDL can model snowflake schema structures by defining entities for each normalized dimension table and declaring relationships (one_to_one, one_to_many, many_to_one) between them. Semantic SQL resolves these relationship chains into correct multi-table joins automatically, so analysts query across normalized dimensions without manually constructing the join paths. CoginitiScript publication can materialize both the fact and normalized dimension tables as part of a single pipeline with dependency ordering handled through block references.
Related Concepts
More in OLTP, OLAP & Workload Types
Analytical Workload
An analytical workload is a class of database queries that examine, aggregate, and analyze large volumes of historical data to extract business insights and support decision-making.
Conformed Dimensions
Conformed dimensions are dimensions that mean the same thing—and carry the same keys, attributes, and values—across multiple fact tables or data marts, letting you combine and compare metrics from different business processes with confidence.
Dimension Table
A dimension table is a database table in a star or snowflake schema that stores descriptive attributes used to filter, group, and drill-down in analytical queries.
Fact Table
A fact table is a database table in a star or snowflake schema that stores measures (quantitative data) and foreign keys to dimensions, representing events or transactions in a business process.
Fan and Chasm Traps
Fan traps and chasm traps are two classic join hazards in dimensional and relational models where the structure of the joins causes measures to be over-counted or under-counted, producing numbers that look valid but are wrong.
Grain
Grain is the level of detail represented by a single row in a table—the precise answer to the question "what does one record mean?"—and it is the foundational decision in dimensional modeling because every dimension and measure in a table must be consistent with it.
See Semantic Intelligence in Action
Coginiti operationalizes business meaning across your entire data estate.