Surrogate Key Generator Implementation in Datastage 8.1,
8.5 and 9.1
The Surrogate Key Generator stage is a processing stage that
generates surrogate key columns and maintains the key source.
A surrogate key is a unique primary key that is not derived from
the data that it represents, therefore changes to the data will not change the
primary key. In a star schema database, surrogate keys are used to join a fact
table to a dimension table.
Surrogate key generator
stage uses:
1. Create
or delete the key source before other jobs run
2. Update a state file
with a range of key values
3. Generate surrogate key
columns and pass them to the next stage in the job
4. View the contents of
the state file
Generated keys are 64 bit integers and the key source can be
stat file or database sequence.
Surrogate keys are used to join a dimension table to a fact table in a star schema database.
When the SCD stage performs a dimension lookup :
A) If a matching record is found, it retrieves the value of the existing surrogate key.
B) If a match is not found, the stage obtains a new surrogate key value by using the derivation of the Surrogate Key column on the Dim Update tab.
• If you want the SCD stage to generate new surrogate keys by using a
key source that you created with a Surrogate Key Generator stage as
described in “Surrogate Key Generator”.
• If you want to use your own method to handle surrogate keys, you
should derive the Surrogate Key column from a source column.
You can replace the dimension information in the source data stream with the
surrogate key value by mapping the Surrogate Key column to the output link.
Creating the key Source :
Drag the surrogate key stage from palette to parallel job canvas with no input and output links.
Properties:
Key Source Action = create
Source Type : FlatFile or Database sequence(in this case we are
using FlatFile)
When you run the job it will create an empty file.
If you want to the check the content change the View Stat File =
YES and check the job log for details.
skey_genstage,0: State file /tmp/skeycutomerdim.stat is empty.
if you try to create the same file again job will abort with the
following error.
skey_genstage,0: Unable to create state file
/tmp/skeycutomerdim.stat: File exists.
Deleting the key source:
Updating the stat File:
To update the stat file add surrogate key stage to the job with
single input link from other stage.
We use this process to update the stat file if it is corrupted
or deleted.
1 1. Open
the surrogate key stage editor and go to the properties tab.
If the stat file exists we can update otherwise we can create
and update it.
We are using SkeyValue parameter to update the stat file
using transformer stage.
Generating Surrogate Keys:
Now we have created stat file and will generate keys using the
stat key file.
Click on the surrogate keys stage and go to properties add add
type a name for the surrogate key column in the Generated Output Column
Name property.
Go to ouput and define the mapping like below.
Rowgen we are using 10 rows and hence when we run the job we see
10 skey values in the output.I have updated the stat file with 100 and below is the output.
If you want to generate the key value from begining you can use
following property in the surrogate key stage.
A. If the key source is a flat file, specify how keys are
generated:
1. To generate keys in sequence from the highest value that was
last used, set the Generate Key from Last Highest Value property
to Yes. Any gaps in the key range are ignored.
2. To specify a value to initialize the key source, add
the File Initial Value property to the Options group, and specify the
start value for key generation.3. To control the block size for key ranges, add the File Block Size property to the Options group, set this property to User specified, and specify a value for the block size.
B. If there is no input link, add the Number of Records property to the Options group, and specify how many records to generate.
No comments:
Post a Comment