Looping concept in Datastage 8.5 or 8.7 Transformer Stage...!

Loop example: Aggregating Data in Transformer

You can use the Transformer stage to add aggregated information to output rows.

Aggregation operations make use of a cache that stores input rows. You can monitor the number of entries in the cache by setting a threshold level in the Loop Variable tab of the Stage Properties window. If the threshold is reached when the job runs, a warning is issued into the log, and the job continues to run.

Input row group aggregation included with input row data

You can save input rows to a cache area, so that you can process this data in a loop.

For example, you have input data that has a column holding a price value. You want to add a column to the output rows. The new column indicates what percentage the price value is of the total value for prices in all rows in that group. The value for the new Percentage column is calculated by the following expression.

(price * 100)/sum of all prices in group

In the example, the data is sorted and is grouped on the value in Col1.

Col1	Col2	Price
1000	abc	100.00
1000	def	20.00
1000	ghi	60.00
1000	jkl	20.00
2000	zyx	120.00
2000	wvu	110.00
2000	tsr	170.00

The percentage for each row in the group where Col1 = 1000 is calculated by the following expression.

(price * 100)/200

The percentage for each row in the group where Col1 = 2000 is calculated by the following expression.

(price * 100)/400

The output is shown in the following table.

Col1	Col2	Price	Percentage
1000	abc	100.00	50.00
1000	def	20.00	10.00
1000	ghi	60.00	30.00
1000	jkl	20.00	10.00
2000	zyx	120.00	30.00
2000	wvu	110.00	27.50
2000	tsr	170.00	42.50

This scenario uses key break facilities that are available on the Transformer stage. You can use these facilities to detect when the value of an input column changes, and so group rows as you process them.

This scenario is implemented by storing the grouped rows in an input row cache and processing them when the value in a key column changes. In the example, the grouped rows are processed when the value in the column named Col1 changes from 1000 to 2000. Two functions, SaveInputRecord() and GetSavedInputRecord(), are used to add input rows to the cache and retrieve them. SaveInputRecord() is called when a stage variable is evaluated, and returns the count of rows in the cache (starting at 1 when the first row is added). GetSavedInputRecord() is called when a loop variable is evaluated.

To implement this scenario in the Transformer stage, make the following settings:

Stage variable

Define the following stage variables:

NumSavedRows: SaveInputRecord()
IsBreak: LastRowInGroup(inlink.Col1)
TotalPrice: IF IsBreak THEN SummingPrice + inlink.Price ELSE 0
SummingPrice: IF IsBreak THEN 0 ELSE SummingPrice + inlink.Price
NumRows: IF IsBreak THEN NumSavedRows ELSE 0

Loop condition

Enter the following expression as the loop condition:

@ITERATION <= NumRows

The loop continues to iterate for the count specified in the NumRows variable.

Loop variables

Define the following loop variable:

SavedRowIndex: GetSavedInputRecord()

Output link metadata and derivations

Define the output link columns and their derivations:

Col1 - inlink.Col1
Col2 - inlink.Col2
Price - inlink.Price
Percentage - (inlink.Price * 100)/TotalPrice

SaveInputRecord() is called in the first Stage Variable (NumSavedRows). SaveInputRecord() saves the current input row in the cache, and returns the count of records currently in the cache. Each input row in a group is saved until the break value is reached. At the last value in the group, NumRows is set to the number of rows stored in the input cache. The Loop Condition then loops round the number of times specified by NumRows, calling GetSavedInputRecord() each time to make the next saved input row current before re-processing each input row to create each output row. The usage of the inlink columns in the output link refers to their values in the currently retrieved input row, so will change on each output loop.

Caching selected input rows

You can call the SaveInputRecord() within an expression, so that input rows are only saved in the cache when the expression evaluates as true.

For example, you can implement the scenario described, but save only input rows where the price column is not 0. The settings are as follows:

Stage variable

Define the following stage variables:

IgnoreRow: IF (inlink.Price = 0) THEN 1 ELSE 0
NumSavedRows: IF IgnoreRecord THEN SavedRowSum ELSE SaveInputRecord()
IsBreak: LastRowInGroup(inlink.Col1)
SavedRowSum: IF IsBreak THEN 0 ELSE NumSavedRows
TotalPrice: IF IsBreak THEN SummingPrice + inlink.Price ELSE 0
SummingPrice: IF IsBreak THEN 0 ELSE SummingPrice + inlink.Price
NumRows: IF IsBreak THEN NumSavedRows ELSE 0

Loop condition

Enter the following expression as the loop condition:

@ITERATION <= NumRows

Loop variables

Define the following loop variable:

SavedRowIndex: GetSavedInputRecord()

Output link metadata and derivations

Define the output link columns and their derivations:

Col1 - inlink.Col1
Col2 - inlink.Col2
Price - inlink.Price
Percentage - (inlink.Price * 100)/TotalPrice

This example produces output similar to the previous example, but the aggregation does not include Price values of 0, and no output rows with a Price value of 0 are produced.

Outputting additional generated rows

This example is based on the first example, but, in this case, you want to identify any input row where the Price is greater than or equal to 100. If an input row has a Price greater than or equal to 100, then a 25% discount is applied to the Price and a new additional output row is generated. The Col1 value in the new row has 1 added to it to indicate an extra discount entry. The original input row is still output as normal. Therefore any input row with a Price of greater than or equal to 100 will produce two output rows, one with the discounted price and one without.

The input data is as shown in the following table:

Col1	Col2	Price
1000	abc	100.00
1000	def	20.00
1000	ghi	60.00
1000	jkl	20.00
2000	zyx	120.00
2000	wvu	110.00
2000	tsr	170.00

The required table is shown in the following table:

Col1	Col2	Price	Percentage
1000	abc	100.00	50.00
1001	abc	75.00	50.00
1000	def	20.00	10.00
1000	ghi	60.00	30.00
1000	jkl	20.00	10.00
2000	zyx	120.00	30.00
2001	zyx	90.00	30.00
2000	wvu	110.00	27.50
2001	wvu	82.50	27.50
2000	tsr	170.00	42.50
2001	tsr	127.50	42.50

To implement this scenario in the Transformer stage, make the following settings:

Stage variable

Define the following stage variables:

NumSavedRowInt: SaveInputRecord()
AddRow: IF (inlink.Price >= 100) THEN 1 ELSE 0
NumSavedRows: IF AddRow THEN SaveInputRecord() ELSE NumSavedRowInt
IsBreak: LastRowInGroup(inlink.Col1)
TotalPrice: IF IsBreak THEN SummingPrice + inlink.Price ELSE 0
SummingPrice: IF IsBreak THEN 0 ELSE SummingPrice + inlink.Price
NumRows: IF IsBreak THEN NumSavedRows ELSE 0

Loop condition

Enter the following expression as the loop condition:

@ITERATION <= NumRows

The loop continues to iterate for the count specified in the NumRows variable.

Loop variables

Define the following loop variables:

SavedRowIndex: GetSavedInputRecord()
AddedRow: LastAddedRow
LastAddedRow: IF (inlink.Price < 100) THEN 0 ELSE IF (AddedRow = 0) THEN 1 ELSE 0

Output link metadata and derivations

Define the output link columns and their derivations:

Col1 - IF (inlink.Price < 100) THEN inlink.Col1 ELSE IF (AddedRow = 0) THEN inlink.Col1 ELSE inlink.Col1 + 1
Col2 - inlink.Col2
Price - IF (inlink.Price < 100) THEN inlink.Price ELSE IF (AddedRow = 0) THEN inlink.Price ELSE inlink.Price * 0.75
Percentage - (inlink.Price * 100)/TotalPrice

SaveInputRecord is called either once or twice depending on the value of Price. When SaveInputRecord is called twice, in addition to the normal aggregation, it produces the extra output record with the recalculated Price value. The Loop variable AddedRow is used to evaluate the output column values differently for each of the duplicate input rows.

Runtime errors

The number of calls to SaveInputRecord() and GetSavedInputRecord() must match for each loop. You can call SaveInputRecord() multiple times to add to the cache, but once you call GetSavedInputRecord(), then you must call it enough times to empty the input cache before you can call SaveInputRecord() again. The examples described can generate runtime errors in the following circumstances by not observing this rule:

If your Transformer stage calls GetSavedInputRecord before SaveInputRecord, then a fatal error similar to the following example is reported in the job log:
```
APT_CombinedOperatorController,0: Fatal Error: get_record() called on 
record 1 but only 0 records saved by save_record()
```
If your Transformer stage calls GetSavedInputRecord more times than SaveInputRecord is called, then a fatal error similar to the following example is reported in the job log:
```
APT_CombinedOperatorController,0: Fatal Error: get_record() called on 
record 3 but only 2 records saved by save_record()
```
If your Transformer stage calls SaveInputRecord but does not call GetSavedInputRecord, then a fatal error similar to the following example is reported in the job log:
```
APT_CombinedOperatorController,0: Fatal Error: save_record() called on 
record 3, but only 0 records retrieved by get_record()
```
If your Transformer stage does not call GetSavedInputRecord as many times as SaveInputRecord, then a fatal error similar to the following example is reported in the job log:
```
APT_CombinedOperatorController,0: Fatal Error: save_record() called on 
record 3, but only 2 records retrieved by get_record()
```

Tabs

Datastage Transformer Stage Looping concept

Looping concept in Datastage 8.5 or 8.7 Transformer Stage...!

You may like this link : Click here to know Transformer Looping Functions For Pivoting

Loop example: Aggregating Data in Transformer

Input row group aggregation included with input row data

Caching selected input rows

Outputting additional generated rows

Runtime errors

disqus

Visitor's View Count

Translate This Blog

Professionals plz visit

.