Wednesday, April 7, 2010

SCDs

When the data warehouse receives notification that an existing row in a
dimension has in some way changed, there are three basic responses.
We call these three basic responses Type 1, Type 2, and Type 3 slowly changing
dimensions (SCDs).
 
Type 1 SCD
 
In Type 1 SCD, we simply overwrite the existing row. It means we don’t maintain history here.
 
e.g.
 
Primary Key        Natural Key        Prod Name         Category              Package Type   
 
4321                       AT04                      Sprite                    Soft Drink            Glass
4321                       AT04                      Sprite                    Soft Drink            Plastic
               
 
Type 2 SCD
 
In Type 2 SCD, history is maintained accurately in the dimensions, and the changes are properly associated with the related facts.
Here when the database is notified the existing dimension row needs to be changed, instead of overwriting the changes, the
database inserts new row at the moment of change.
This new record is assigned a new surrogate primary key and this primary key is used in all the fact tables for which this dimension is
a foreign key.
Type 2 SCD perfectly partitions history because each detailed version of a dimension entity is correctly connected to the span of fact table records.
 
Kimball recommends dimension tables to provide optional useful information about Type 2 dimension changes.
 
He recommends adding the following five fields to dimension tables processed
with Type 2 logic:
 
1.       Calendar Date                   foreign key (date of change)
2.       Row Effective                    DateTime (exact date-time of change)
3.       Row End                              DateTime (exact date-time of next change)
4.       Reason for Change          (text field)
5.       Current Flag                       (current/expired)
 
Primary Key        Natural Key(Emp ID)       Designation        Department       Calendar Date   Start Date            End Date                Reason For Change         Current Flag
 
4321                       E00234                                  Grade I                 Sales                      31-Mar-2008      04-Apr-2007       31-Mar-2008                Joining                                  0
4322                       E00234                                  Grade II                Sales                      31-Mar-2009      01-Apr-2008       31-Mar-2009                Promotion                          0
4323                       E00234                                  Grade II                Marketing           01-Apr-2009       01-Apr-2009       31-Mar-2099                Dept Changed                   1
 

Kimball also recommends not to set the Row End DateTime value to NULL as it may give erroneous results when used in a BETWEEN logic.
This value can be set to any arbitrary far future value.
 
Type 3
 
Type 3 SCD is used when a change happens to a dimension record but the old value remains valid as a second choice.
In Type 3 SCD, instead of issuing a new row when the change takes place a new column is added to the table and
the old value is placed in this column before it is overwritten.
 
 
e.g.
 
Primary Key        Natural Key(Prod ID)      Prod Name         Size        Category              Colour  
 
1127648                A 107B                                  Denim Pants      30           men’s Wear       Blue
 
 
Primary Key        Natural Key(Prod ID)      Prod Name         Size        Category              Old Category      Colour  
 
1127648                A 107B                                  Denim Pants      30           Leisure Wear     Men’s Wear       Blue

No comments:

Post a Comment