Looping concept in Datastage 8.5 or 8.7 Transformer Stage...!
You may like this link : Click here to know Transformer Looping Functions For Pivoting
Loop example: Aggregating Data in Transformer
You can use the Transformer stage to add aggregated information
to output rows.
Aggregation operations make use of a cache that stores input rows.
You can monitor the number of entries in the cache by setting a threshold
level in the Loop Variable tab of the Stage Properties window. If
the threshold is reached when the job runs, a warning is issued into
the log, and the job continues to run.
Input row group aggregation included with input row data
You can save input rows to a cache area, so that you can process this data in a loop.
For example, you have
input data that has a column holding a price value. You want to add
a column to the output rows. The new column indicates what percentage
the price value is of the total value for prices in all rows in that
group. The value for the new Percentage column is calculated by the
following expression.
(price * 100)/sum of all prices in group
In the example, the data is sorted and is grouped on the value in Col1.
Col1 | Col2 | Price |
---|---|---|
1000 | abc | 100.00 |
1000 | def | 20.00 |
1000 | ghi | 60.00 |
1000 | jkl | 20.00 |
2000 | zyx | 120.00 |
2000 | wvu | 110.00 |
2000 | tsr | 170.00 |
The percentage for each row in the group where Col1 = 1000 is calculated by the following expression.
(price * 100)/200
The percentage for each row in the group where Col1 = 2000 is calculated by the following expression.
(price * 100)/400
The output is shown in the following table.
Col1 | Col2 | Price | Percentage |
---|---|---|---|
1000 | abc | 100.00 | 50.00 |
1000 | def | 20.00 | 10.00 |
1000 | ghi | 60.00 | 30.00 |
1000 | jkl | 20.00 | 10.00 |
2000 | zyx | 120.00 | 30.00 |
2000 | wvu | 110.00 | 27.50 |
2000 | tsr | 170.00 | 42.50 |
This scenario uses key break facilities that are available on the Transformer stage. You can use these facilities to detect when the value of an input column changes, and so group rows as you process them.
This scenario is implemented by storing the grouped rows in an input row cache and processing them when the value in a key column changes. In the example, the grouped rows are processed when the value in the column named Col1 changes from 1000 to 2000. Two functions, SaveInputRecord() and GetSavedInputRecord(), are used to add input rows to the cache and retrieve them. SaveInputRecord() is called when a stage variable is evaluated, and returns the count of rows in the cache (starting at 1 when the first row is added). GetSavedInputRecord() is called when a loop variable is evaluated.
To implement this
scenario in the Transformer stage, make the following settings:
- Stage variable
- Define the following stage variables:
- NumSavedRows
- SaveInputRecord()
- IsBreak
- LastRowInGroup(inlink.Col1)
- TotalPrice
- IF IsBreak THEN SummingPrice + inlink.Price ELSE 0
- SummingPrice
- IF IsBreak THEN 0 ELSE SummingPrice + inlink.Price
- NumRows
- IF IsBreak THEN NumSavedRows ELSE 0
- Loop condition
- Enter the following expression as the loop condition:
The loop continues to iterate for the count specified in the NumRows variable.@ITERATION <= NumRows
- Loop variables
- Define the following loop variable:
- SavedRowIndex
- GetSavedInputRecord()
- Output link metadata and derivations
- Define the output link columns and their derivations:
- Col1 - inlink.Col1
- Col2 - inlink.Col2
- Price - inlink.Price
- Percentage - (inlink.Price * 100)/TotalPrice
SaveInputRecord() is called in the first Stage Variable
(NumSavedRows). SaveInputRecord() saves the current input row in the
cache, and returns the count of records currently in the cache. Each
input row in a group is saved until the break value is reached. At
the last value in the group, NumRows is set to the number of rows
stored in the input cache. The Loop Condition then loops round the
number of times specified by NumRows, calling GetSavedInputRecord()
each time to make the next saved input row current before re-processing
each input row to create each output row. The usage of the inlink
columns in the output link refers to their values in the currently
retrieved input row, so will change on each output loop.
Caching selected input rows
You can call
the SaveInputRecord() within an expression, so that input rows are
only saved in the cache when the expression evaluates as true.
For example, you can implement the scenario described, but save
only input rows where the price column is not 0. The settings are
as follows:
- Stage variable
- Define the following stage variables:
- IgnoreRow
- IF (inlink.Price = 0) THEN 1 ELSE 0
- NumSavedRows
- IF IgnoreRecord THEN SavedRowSum ELSE SaveInputRecord()
- IsBreak
- LastRowInGroup(inlink.Col1)
- SavedRowSum
- IF IsBreak THEN 0 ELSE NumSavedRows
- TotalPrice
- IF IsBreak THEN SummingPrice + inlink.Price ELSE 0
- SummingPrice
- IF IsBreak THEN 0 ELSE SummingPrice + inlink.Price
- NumRows
- IF IsBreak THEN NumSavedRows ELSE 0
- Loop condition
- Enter the following expression as the loop condition:
@ITERATION <= NumRows
- Loop variables
- Define the following loop variable:
- SavedRowIndex
- GetSavedInputRecord()
- Output link metadata and derivations
- Define the output link columns and their derivations:
- Col1 - inlink.Col1
- Col2 - inlink.Col2
- Price - inlink.Price
- Percentage - (inlink.Price * 100)/TotalPrice
This example produces output similar to the previous
example, but the aggregation does not include Price values of 0, and
no output rows with a Price value of 0 are produced.
Outputting additional generated rows
This
example is based on the first example, but, in this case, you want
to identify any input row where the Price is greater than or equal
to 100. If an input row has a Price greater than or equal to 100,
then a 25% discount is applied to the Price and a new additional output
row is generated. The Col1 value in the new row has 1 added to it
to indicate an extra discount entry. The original input row is still
output as normal. Therefore any input row with a Price of greater
than or equal to 100 will produce two output rows, one with the discounted
price and one without.
The input data is as shown in the following
table:
Col1 | Col2 | Price |
---|---|---|
1000 | abc | 100.00 |
1000 | def | 20.00 |
1000 | ghi | 60.00 |
1000 | jkl | 20.00 |
2000 | zyx | 120.00 |
2000 | wvu | 110.00 |
2000 | tsr | 170.00 |
The required table is shown in the following table:
Col1 | Col2 | Price | Percentage |
---|---|---|---|
1000 | abc | 100.00 | 50.00 |
1001 | abc | 75.00 | 50.00 |
1000 | def | 20.00 | 10.00 |
1000 | ghi | 60.00 | 30.00 |
1000 | jkl | 20.00 | 10.00 |
2000 | zyx | 120.00 | 30.00 |
2001 | zyx | 90.00 | 30.00 |
2000 | wvu | 110.00 | 27.50 |
2001 | wvu | 82.50 | 27.50 |
2000 | tsr | 170.00 | 42.50 |
2001 | tsr | 127.50 | 42.50 |
To implement this scenario in the Transformer
stage, make the following settings:
- Stage variable
- Define the following stage variables:
- NumSavedRowInt
- SaveInputRecord()
- AddRow
- IF (inlink.Price >= 100) THEN 1 ELSE 0
- NumSavedRows
- IF AddRow THEN SaveInputRecord() ELSE NumSavedRowInt
- IsBreak
- LastRowInGroup(inlink.Col1)
- TotalPrice
- IF IsBreak THEN SummingPrice + inlink.Price ELSE 0
- SummingPrice
- IF IsBreak THEN 0 ELSE SummingPrice + inlink.Price
- NumRows
- IF IsBreak THEN NumSavedRows ELSE 0
- Loop condition
- Enter the following expression as the loop condition:
The loop continues to iterate for the count specified in the NumRows variable.@ITERATION <= NumRows
- Loop variables
- Define the following loop variables:
- SavedRowIndex
- GetSavedInputRecord()
- AddedRow
- LastAddedRow
- LastAddedRow
- IF (inlink.Price < 100) THEN 0 ELSE IF (AddedRow = 0) THEN 1 ELSE 0
- Output link metadata and derivations
- Define the output link columns and their derivations:
- Col1 - IF (inlink.Price < 100) THEN inlink.Col1 ELSE IF (AddedRow = 0) THEN inlink.Col1 ELSE inlink.Col1 + 1
- Col2 - inlink.Col2
- Price - IF (inlink.Price < 100) THEN inlink.Price ELSE IF (AddedRow = 0) THEN inlink.Price ELSE inlink.Price * 0.75
- Percentage - (inlink.Price * 100)/TotalPrice
SaveInputRecord is called either once or twice depending
on the value of Price. When SaveInputRecord is called twice, in addition
to the normal aggregation, it produces the extra output record with
the recalculated Price value. The Loop variable AddedRow is used to
evaluate the output column values differently for each of the duplicate
input rows.
Runtime errors
The number of calls to SaveInputRecord()
and GetSavedInputRecord() must match for each loop. You can call SaveInputRecord()
multiple times to add to the cache, but once you call GetSavedInputRecord(),
then you must call it enough times to empty the input cache before
you can call SaveInputRecord() again. The examples described can generate
runtime errors in the following circumstances by not observing this
rule:
- If your Transformer stage calls GetSavedInputRecord before SaveInputRecord,
then a fatal error similar to the following example is reported in
the job log:
APT_CombinedOperatorController,0: Fatal Error: get_record() called on record 1 but only 0 records saved by save_record()
- If your Transformer stage calls GetSavedInputRecord more times
than SaveInputRecord is called, then a fatal error similar to the
following example is reported in the job log:
APT_CombinedOperatorController,0: Fatal Error: get_record() called on record 3 but only 2 records saved by save_record()
- If your Transformer stage calls SaveInputRecord but does not call
GetSavedInputRecord, then a fatal error similar to the following example
is reported in the job log:
APT_CombinedOperatorController,0: Fatal Error: save_record() called on record 3, but only 0 records retrieved by get_record()
- If your Transformer stage does not call GetSavedInputRecord as
many times as SaveInputRecord, then a fatal error similar to the following
example is reported in the job log:
APT_CombinedOperatorController,0: Fatal Error: save_record() called on record 3, but only 2 records retrieved by get_record()
No comments:
Post a Comment