In data warehousing, a slowly changing dimension (SCD) is a dimension table where the attributes change over time, but not very frequently. The term “slowly changing” refers to the fact that these changes happen occasionally (perhaps days, weeks, or months apart) rather than constantly like transactional data.
The challenge is figuring out how to handle these changes while maintaining accurate historical analysis. When a customer moves to a new state or a product gets recategorized, you need a strategy that preserves the integrity of your historical data.
Why This Matters
Imagine you’re analyzing last year’s sales data. One of your top customers moved from California to Texas six months ago. Should their historical orders show up under California (where they actually lived at the time) or Texas (where they live now)?
Or consider a product that was in the “Electronics” category but got moved to “Smart Home Devices” last quarter. When you run a report on Electronics sales for the past two years, should it include this product’s historical sales or not?
These aren’t just academic questions. They affect revenue reporting, trend analysis, forecasting, and business decisions. Different situations call for different approaches.
Introducing SCD Types
Over time, the data warehousing community has identified several standard approaches for handling dimensional changes. These are known as SCD types, and each represents a different tradeoff between historical accuracy, query complexity, and storage requirements.
The type you choose depends on the specific attribute and how it’s used in analysis. Some attributes need full historical tracking, others just need current values, and some fall somewhere in between. Understanding these types helps you make informed decisions about how to handle each attribute in your dimensions.
Type 0: Retain Original
Type 0 is the simplest approach – never change the attribute once it’s set. This works for attributes that genuinely shouldn’t change, like a customer’s original registration date or a product’s initial launch date.
If something is truly fixed at the time of creation and any “change” would represent a data error rather than a legitimate update, Type 0 makes sense. But this isn’t really handling change at all. Rather, it’s defining certain attributes as immutable.
Type 1: Overwrite
Type 1 simply overwrites the old value with the new one. When a customer updates their email address, you replace the old email with the new one and move on. The historical value is lost.
This is appropriate for attributes where you don’t care about history, or where the change is a correction rather than an actual change. If a customer’s name was misspelled, you want to fix it everywhere. If a product description had a typo, you just want to correct it.
Type 1 is also common for attributes that are useful for current operations but irrelevant for historical analysis. Current mailing addresses, phone numbers, or preference settings often fall into this category. In these cases, you just need to know the current value.
The downside with this is that all historical records reflect the new value. If you overwrite a customer’s state from California to Texas, every report you run will show all their past orders as if they came from Texas, even orders from years ago when they actually lived in California.
Type 2: Add New Row
Type 2 creates a new row in the dimension table whenever an attribute changes, preserving the full history. This is the most common approach for attributes where historical accuracy matters.
The way it works is that each row in the dimension table gets additional columns to track its validity period. Common approaches include:
- Effective dates:
start_dateandend_datecolumns that define when this version of the record was valid. The current version might have anend_datein the far future (like9999-12-31) orNULLto indicate it’s still active. - Current flag: a boolean column like
is_currentthat marks whether this is the active version. - Version numbers: a sequential counter showing which version of the record this is.
When a customer moves from California to Texas, you don’t update their existing row. Instead, you close out the old row (set its end_date to today, set is_current to false) and insert a new row with the updated address and a new surrogate key.
The fact table uses surrogate keys rather than natural business keys precisely to support this pattern. Old transactions keep their original customer_key pointing to the California version of the record. New transactions get the new customer_key pointing to the Texas version.
This means when you analyze historical data, you see it exactly as it was. Last year’s California sales include this customer’s orders from when they lived in California. This year’s Texas sales include their recent orders.
Type 3: Add New Column
Type 3 adds columns to track both the current value and one or more previous values. You might have current_state and previous_state columns, or current_category and original_category.
This gives you limited history without the complexity of multiple rows. You can answer questions like “show me customers who changed states this year” or “compare sales of products in their current category versus their original category”.
The limitation with this approach is that you only track a fixed number of historical values. If a customer moves three times, you only remember the most recent previous address. And you can only do this for a few attributes before the table gets unwieldy. Having current and previous versions of 20 different columns gets messy fast.
Type 3 is useful when you need simple before-and-after analysis but don’t need complete historical tracking.
Type 4: Add History Table
Type 4 splits current and historical data into separate tables. The main dimension table contains only the current version of each record, while a separate history table tracks all the changes.
When using this approach, the dimension table stays small and simple (just one row per customer or product with their current attributes). The history table stores previous versions with effective dates.
Type 4 is less common in practice. It adds complexity to queries since you need to union the current and historical tables to see the full picture. Most systems that need historical tracking just use Type 2.
Type 5: Mini-Dimension
Type 5 involves creating a separate “mini-dimension” table for a subset of frequently changing attributes. Instead of tracking these changes in the main dimension, you split them off into their own small dimension table that the fact table references directly.
For example, if customer credit scores change frequently but other customer attributes don’t, you might create a separate credit_score dimension that gets updated more often while keeping the main customer dimension stable.
This approach is rarely used in practice. It adds complexity with multiple dimension table joins and is typically only considered when you have a few specific attributes that change much more frequently than others. Most organizations find that Type 2 on the main dimension or separating the attributes into their own full dimension works better than the Type 5 pattern.
Type 6: Hybrid Approach
Type 6 (sometimes called Type 1+2+3) combines elements of Types 1, 2, and 3. You create new rows like Type 2 to preserve history, but you also store current values in every row like Type 3, and you use Type 1 overwrites for those current value columns.
Each row has columns for both the historical attribute values (what it was during this row’s effective period) and the current values (what it is now, overwritten across all historical rows).
For example, a customer dimension might have:
state(the state during this row’s effective period – Type 2)current_state(overwritten to always show the current state – Type 1)start_dateandend_date(Type 2)is_currentflag (Type 2)
This lets you analyze data both ways – as it was historically and with current values. You can report “sales by customer’s historical state” or “sales by customer’s current state” depending on what makes sense for the analysis.
Type 6 is powerful but adds complexity. You’re duplicating data and need to maintain those current value columns across potentially millions of historical rows whenever something changes.
Choosing the Right Type
Your choice of SCD will depend on your specific needs for each attribute. Here’s a rough guide:
- Use Type 1 when you don’t need history, when changes are corrections rather than true changes, or when the attribute only matters for current operations.
- Use Type 2 when historical accuracy is important for reporting and analysis. This is the default choice for most business-critical attributes in data warehouses.
- Use Type 3 when you need simple before-and-after comparisons but don’t need complete historical tracking.
- Use Type 6 when you need both historical and current perspectives on the same data, and the added complexity is worth it.
Remember, different attributes in the same dimension can use different types. In a customer dimension, you might use Type 1 for email addresses, Type 2 for customer segments, and Type 3 for loyalty tier changes.
The following table outlines examples of use cases for each of the types:
| Type | Action | History Preserved? | Best Use Case |
| Type 0 | Retain Original | No | Fixed values like “Date of Birth” or “Original Start Date.” |
| Type 1 | Overwrite | No | Correcting typos or tracking things where history doesn’t matter (e.g., a phone number). |
| Type 2 | Add New Row | Yes (Full) | Tracking a customer’s address or a product’s price history. This is widely considered the gold standard. |
| Type 3 | Add New Column | Yes (Partial) | When you only care about the “current” vs. the “previous” value (e.g., sales territory). |
| Type 4 | History Table | Yes (Separate) | High-volume changes where you want to keep the main table small and fast. |
| Type 5 | Hybrid (4+1) | Yes (Separate) | Large dimensions where you need the current profile value linked directly in the base record without joining the history table. Rarely used in practice. |
| Type 6 | Hybrid (1+2+3) | Yes (Full) | Complex reporting where you need to group by both historical and current values. |
Implementation Considerations for Type 2
While Type 2 dimensions are the most commonly used, they will affect your ETL processes significantly. You need logic to detect changes, close out old records, and insert new ones while maintaining surrogate key relationships. Your ETL needs to handle cases where multiple attributes change simultaneously. Do you create one new row or track each change separately?
Query complexity also increases with Type 2. Users need to understand that joining to a dimension might pull in historical versions of records. Your BI tools need to handle effective date filtering properly.
Storage requirements will also grow over time with Type 2, though usually not dramatically. If a customer dimension has 1 million customers and addresses change every few years on average, you might accumulate 2-3 million rows over a decade. For most systems, this is manageable, but it’s certainly something to be mindful of.
Slowly Changing Dimensions in Practice
As alluded to, most data warehouses use Type 2 as their default for important dimensional attributes. It provides the historical accuracy that makes data warehouses valuable in the first place. You’re building a system to understand what happened in your business over time, and preserving dimensional context is essential to that mission.
The main thing is to be intentional about which attributes get which treatment. Not everything needs Type 2 tracking. Focus on attributes that affect how you analyze and understand your metrics, and use simpler approaches for everything else.