Use of Stages in Datastage 8.5 or 8x Series :
In this we will see the list of Stages and Activities available in DataStage
All stage can be found in the Palette. They are categorized based on the functionality they provide.
Click on the below categories to see the stages under each of them
General Stage
Annotation: are like comments for a DataStage jobs. They are used in jobs to provide information about the job
Container: As the name suggests are used to contain
or hold a group of stage. In cases where the transformation logic
involves a lot of stages, container can be used to hold all the
transformation logic inside them. This will make the job look a little
less complex for new developers.
Here is how the job will look when opened:
Here the transformation logic is inside the container stage. It will be opened when you double click on the container stage.
Note that it has “Input” and “Output”; this means that the Container stage can handle only a single input and single output.
To make the above container Shared, right click on
the container and select the option “Convert to Shared”. By doing this,
we are making the transformation logic inside the container available
for use by any job in the project.
Description Annotation: is used to enter the description about the job. Here is an example
Link: is used as a connector between two stage. It shows the direction of flow of data in a DataStage job.
Database Stages
These stages are used to connect to various databases. Most of the
stages that you see below allow native connectivity to the respective
databases.
In cases when native connectivity is not available we use the ODBC stage or Dynamic RDBMS stage
ODBC Stage: Extracts data from or loads data into databases that support the industry standard Open Database Connectivity API
Dynamic RDBMS Stage: is a database stage that performs read write operations on the following list of databases
Informix, UDB/DB2, Microsoft SQL/Server, Oracle and Sybase.
Informix, UDB/DB2, Microsoft SQL/Server, Oracle and Sybase.
Note: We can also use ODBC connection string to establish a connection to a database.
Stored Procedure: stage is dedicated to run stored procedure as the other database stages do not allow this.
Stages used most frequently will be explained in the next lessons.
Development/Debug Stages
As the name suggests, these stages are used for testing and debugging.
Column Generator: Is used to generate columns
Head: is used to get the n records starting from the
first record. Additionally by specifying other options we can also get
the n’th record starting from the 1st record.
Peek: is an output stage used in debugging. In
instances where we have transformed the data and would like to check if
the transformation is working properly before loading data to the
target, we use a peek stage. The records that are sent to Peek stage can
be either viewed in the job log in Director or can be written to a
file.
Row Generator: as the name suggests is used to generate rows of data
Sample: Will take a random subset of input data that it receives and distributes it among output links
Tail: is used to get the n records starting from the last record.
Write Range Map: is used to create a Range Map that can be used while performing range partition.
File Stages
These are used to read and write data to various file systems
Complex Flat File:This is used to read a file or
write to a file, but cannot be used to do both at the same time. It can
have multiple output links and a single reject link.
It can be used to read data from one or more complex flat files like MVS data sets with QSAM and VSAM files. We can also read data from files that contain multiple record types.
It can be used to read data from one or more complex flat files like MVS data sets with QSAM and VSAM files. We can also read data from files that contain multiple record types.
Data Set stage is used to create Data Sets. These are operating system files and are referred by a control file which has the extension .ds.
Data Sets are only readable from within DataStage and are commonly used
to improve performance. It allows only a single input or a single
output and unlike sequential files they can be executed in parallel
External Source stage allows us to interact with or
call external programs. This stage is mostly used to interface with
source that are not currently supported by DataStage. It can have a
single output link and single reject link
External Target stage like external source stage is
used to write data to source that are currently not supported by
DataStage. It can have a single input link and a reject link.
File Set stage is used to create files and save them with the extension .fs.
It is useful because most flat files have a 2GB limit and using flat
file stage we can store data that is more than the limit. Unlike the
sequential file stage, this stage runs only in parallel mode. The
difference between Data Set and File Set is that, File Set stores the
formatting information that describe the format of the files to be read
or written.
Lookup File Set stage is used to create file sets
that are going to used with Lookup Stage. The extension for files
created by this stage is also .fs. Using a Lookup File Set to perform a lookup
will result in a significant increase in performance when compared to
other stages. The stage can have a single input link and an output link
that must always be a reference link
Example:
Example:
Sequential File is used read and write data to a
single or multiple flat files. It is one the most commonly used stages
in DataStage. It can have a single input link or a single output link
and a reject link. This stage executes in parallel mode when extracting
data from multiple files but will run sequentially when extracting from a
single file.
Processing Stages
These are the stages that perform all the transformations in DataStage. Let us briefly see what each one of the does.
Please note the most frequently used stages will be explained in detail with examples in the next lessons
Aggregator: stage as the name suggests is used to perform aggregations and grouping functions. Here are some functions that can be performed using Aggregator stage – Sum, Average, Max value, Min value, Group by and more.
Change Capture stage is used to identify the
differences between two streams of data. One link is assigned as the
before link and the other link is after link. Based on the comparison
done using the defined key fields, the stage generates the following
codes 0,1,2,3,4
0 -> (duplicate) denotes the record is a copy
1 -> (new record) denotes its a new record and has to be insert
2 -> (delete) denotes the record can be deleted delete
3 -> (edit/modify) denotes the record exists but the value field in after link is different from the one in before link.
1 -> (new record) denotes its a new record and has to be insert
2 -> (delete) denotes the record can be deleted delete
3 -> (edit/modify) denotes the record exists but the value field in after link is different from the one in before link.
Change Apply stage reads a record from the change
data set and from the before data set, compares their key column values,
and acts accordingly (Insert, Delete, Edit, Copy).
Compare and Difference stages as the names suggest are used to compare the difference between two records based on key values
Copy stage can create multiple output links for a single input link.
Ex: If our input data has only one record
Name Age Sex
ABC 20 F
We can use this stage to create any number of duplicates for this record
Ex: If our input data has only one record
Name Age Sex
ABC 20 F
We can use this stage to create any number of duplicates for this record
Decode and Encode stages are used to encode the data based on a key value and they decode it based on the same value.
Expand stage is used to extract zipped content/files.
Filter stage is used to segregate records to different links based on filer conditions
FTP stage is used to FTP files.
Funnel stage is used to collect records from multiple input links and funnel them to a single output link.
Join, Lookup and Merge are very useful stages and
are used to perform all kinds of joins on data. They differ in memory
utilization and number of output & reject links that they allow.
will be explained in detail with examples
Modify stage is used to modify the meta data of the columns.
Pivot stage is used to convert rows to columns and columns to rows
Remove Duplicates stage as the name suggests is used to remove duplicate records from the input data.
Slowly Changing Dimension stage is used to implement SCD type 1 & 2 in datastage jobs
Sort stage is used to sort input data and send it to the output. It can also remove duplicates.
Switch stage takes a single input and assigns each
input row to an output link based on the value of a selector field. The
Switch stage is similar to Switch statement in C or SQL.
Transformer stage is probably the most used stage in
DataStage. It has built-in transformation functions and additional ones
can also be created by user (Routines)
No comments:
Post a Comment