Data Warehouse Design: A Practical Guide to Getting the Architecture Right

Data warehouse design is the process of structuring a centralized database to consolidate data from multiple systems for consistent reporting. The three decisions that determine success are the schema model (like Star or Snowflake), the architectural layers, and the design methodology. Getting these right prevents the warehouse from becoming a maintenance nightmare.

Get those three right and everything else – ETL pipelines, dashboards, query performance – sits on a solid foundation. Get them wrong and you spend the next two years rebuilding.

Schema Design: Star, Snowflake, or Galaxy?

The schema is how your tables relate to each other. Most warehouses use one of three patterns:

Schema Type	Structure	Query Speed	Complexity	Best Use Case
Star Schema	One central fact table, denormalised dimension tables directly connected	Fast – fewer joins	Low – easy to understand and maintain	Most BI and reporting workloads; recommended starting point
Snowflake Schema	Fact table + normalised dimension tables (dimensions split into sub-dimensions)	Slower – more joins required	Medium – more tables, cleaner data model	Storage-sensitive environments; complex dimension hierarchies
Galaxy / Constellation	Multiple fact tables sharing dimension tables	Variable – depends on query	High – complex to maintain	Enterprise warehouses with multiple business processes

For most teams building their first warehouse: start with Star Schema. It’s easier to query, easier to explain to stakeholders, and easier to refactor later. Snowflake adds storage efficiency at the cost of query complexity – a trade-off that rarely makes sense until you’re managing hundreds of millions of rows.

The Three-Layer Architecture

A well-designed warehouse separates concerns across three layers. Collapsing these into one is one of the most common design mistakes:

Staging Layer: Raw, unmodified copies of source data – the archive. Nothing transforms here. If something goes wrong downstream, you re-process from this layer
Integration / Core Layer: Cleaned, transformed, and joined data. Business rules applied here. This is where Kimball’s dimensional model or Inmon’s normalised model lives
Access / Presentation Layer: Aggregated, pre-joined views optimised for reporting tools. What your BI dashboards actually query. Rebuilding this layer doesn’t require touching the core

The separation matters because it gives you a clean re-run path when source systems change, and it keeps your reporting layer fast without polluting the integration logic.

Kimball vs. Inmon: The Two Design Philosophies

Factor	Kimball (Bottom-Up)	Inmon (Top-Down)
Philosophy	Build dimensional data marts first; enterprise view emerges from marts	Build enterprise data model first; data marts are subsets
Design Direction	Business process → dimensional model → data mart	Enterprise model → ETL → data mart
Time to First Value	Faster – first mart can deliver in weeks	Slower – requires enterprise model upfront
Best For	Agile teams, departmental projects, faster ROI needed	Large enterprises, regulated industries, long-term consistency
Key Strength	Pragmatic, business-aligned, fast delivery	Single source of truth, highly consistent
Key Weakness	Integration across marts can be complex later	Upfront investment is significant; slow to first delivery

Most modern data teams lean Kimball – the faster time-to-value and business-process alignment fits the way analytics teams actually operate. Inmon is more common in financial services and healthcare where data governance and consistency across the enterprise justify the upfront cost.

Slowly Changing Dimensions (SCD): Types 1, 2, and 3

SCDs handle the problem of dimension data that changes over time – a customer changes their address, an employee changes department, a product changes category. How you record that change matters:

Type 1 – Overwrite: Simply update the record. No history kept. Use when history genuinely doesn’t matter (e.g. correcting a typo)
Type 2 – Add New Row: Keep the old record, insert a new one with a validity date range. Preserves full history. The most common choice for anything where historical accuracy matters – customer addresses, product pricing
Type 3 – Add Column: Add a “previous value” column. Tracks one level of change only. Rarely used – too limited for most real scenarios

In practice: default to Type 2 for any dimension where you’ll want to ask “what was the value at the time of this transaction?” That covers most business-critical dimensions.

Cloud Data Warehouses: How Design Adapts

Modern cloud warehouses – Snowflake, BigQuery, Amazon Redshift, Azure Synapse – change some traditional design constraints:

Separation of compute and storage means you don’t pre-optimise for storage the way on-premise design required
Columnar storage makes wide denormalised tables (Star Schema) even faster – less reason to normalise for performance
Serverless options (BigQuery, Athena) remove the cluster sizing problem entirely for many teams
ELT instead of ETL: load raw data first, transform inside the warehouse using dbt or similar – the staging layer becomes even more important

5 Common Design Mistakes to Avoid

Building the semantic layer before the data model is stable – dashboards built on shifting foundations need constant rework
No staging layer – loading transformed data directly with no raw copy makes re-processing from source impossible
Treating every data source as its own mart – no shared dimensions means no consistent metrics across the business
Over-normalising in a cloud warehouse – the cost of extra joins is higher than storage savings in modern columnar systems
Designing for today’s questions only – a warehouse that can’t accommodate new dimensions without structural rework will become the bottleneck within 18 months

Final Thought

Good data warehouse design is invisible. When it’s working, analysts just notice that their reports run fast, the numbers are consistent across departments, and adding a new data source doesn’t require a two-week project. That invisibility is the goal. Most of the visible problems in analytics – conflicting metrics, slow dashboards, failed pipeline runs – trace back to architectural decisions made early that nobody questioned at the time.

Data Warehouse Design: A Practical Guide to Getting the Architecture Right

Why Does My Room Smell Worse With Windows Closed

First Floor Bedroom Conversion Without a Full Bathroom

Why Does My House Smell Musty in Seattle? (Causes + Solutions)

Integrating Aquatic Exercise Into Your Regular Fitness Program for Optimal Health

The Connection Between Mental Toughness and Emotional Intelligence

Why Quality Tailoring Services Matter In Sydney

Why Does My Room Smell Worse With Windows Closed

First Floor Bedroom Conversion Without a Full Bathroom

Data Warehouse Design: A Practical Guide to Getting the Architecture Right

Schema Design: Star, Snowflake, or Galaxy?

The Three-Layer Architecture

Kimball vs. Inmon: The Two Design Philosophies

Slowly Changing Dimensions (SCD): Types 1, 2, and 3

Cloud Data Warehouses: How Design Adapts

5 Common Design Mistakes to Avoid

Final Thought

Related Posts