DEV'S DATASTAGE TUTORIAL,GUIDES,TRAINING AND ONLINE HELP 4 U. UNIX, ETL, DATABASE RELATED SOLUTIONS: IBM DataStage 8.5 New Features

DataStage 8.5 Features

This is a list of the ten best things in Datastage 8.5. Most of these are improvements in DataStage Parallel Jobs only while a couple of them will help Server Job customers as well.

1. Faster Performanace then Older Version

Faster, faster, faster. A lot of tasks in DataStage 8.5 are at least 40% faster than 8.1 such as starting DataStage, opening a job, running a Parallel job and runtime performance have all improved.

2. It' is now an XML ETL Tool

Previous versions of DataStage were mediocre at processing XML. DataStage 8.5 is a great XML processing tool. It can open, understand and store XML schema files. I did a longer post about just this pack in New Hierarchical Transformer makesDataStage great a XML Tool and if you have XML files without schemas you can follow a tip at the DataStage Real Time blog: The new XMLPack in 8.5….generating xsd’s….

The new XML read and transform stages are much better at reading large and complex XML files and processing them in parallel:

3. Transformer Looping

The best Transformer yet. The DataStage 8.5 parallel transformer is the best version yet thanks to new functions for looping inside a transformer and performing transformations across a grouping of records.

With looping inside a Transformer you can output multiple rows for each input row.

Transformer Remembering

DataStage 8.5 Transformer has Remembering and key change detection which is something that ETL experts have been manually coding into DataStage for years using some well known workarounds. A key change in a DataStage job involves a group of records with a shared key where you want to process that group as a type of array inside the overall recordset.

I am going to make a longer post about that later but there are two new cache objects inside a Transformer – SaveInputRecord() and GetSavedInputRecord(0 where you can save a record and retrieve it later on to compare two or more records inside a Transformer.

There are new system variables for looping and key change detection - @ITERATION, LastRow() indicates the last row in a job, LastTwoInGroup(InputColumn) indicates a particular column value will change in the next record.

Here is an aggregation example where rows are looped.

Click here to Know Pivoting through Transformer

4. Easy to Install

Easier to install and more robust. DataStage 8.5 has the best installer of any version of DataStage ever. Mind you – I jumped aboard the DataStage train in version 3.6 so I cannot vouch for earlier installers but 8.5 has the best wizard, the best pre-requisite checking and the best recovery. It also has the IBM Support Assistant packs for Information Server that make debugging and reporting of PMRs to IBM much easier. There is also a Guide to Migrating to InfoSphere Information Serve 8.5 that explains how to migrate from most earlier versions.

5. Check In and Check Out Jobs

Check in and Check out version control. DataStage 8.5 Manager comes with direct access to the source control functions of CVS and Rational ClearCase in an Eclipse workspace. You can send artefacts to the source control system and replace a DataStage component from out of the source control system.

DataStage 8.5 comes with out of the box menu integration with CVS and Rational ClearCase but for other source control systems you need to use the Eclipse source control plugins.

6. High Availability Easier than ever

High Availability – the version 8.5 installation guide has over thirty pages on Information Server topologies including a bunch of high availability scenarios across all tiers of the product. On top of that there are new chapters for the high availability of the metadata repository, the services layer and the DataStage engine.

Horizontal and vertical scaling and load balancing.
Cluster support for WebSphere Application Server.

Cluster support for XMETA repository: DB2 HADR/Cluster or Oracle RAC.
Improved failover support on the engine.

7. New Information Architecture Diagramming Tool

InfoSphere Blueprint Direct – DataStage 8.5 comes with a free new product for creating diagrams of an information architecture and linking elements in the diagram directly into DataStage jobs and Metadata Workbench metadata. Solution Architects can draw a diagram of a data integration solution including sources, Warehouses and repositories.

8. Vertical Pivot

It is now available and it can pivot multiple input rows with a common key into output rows with multiple columns. Key based groups, columnar pivot and aggregate functions.

You can also do this type of vertical pivoting in the new Transformer using the column change detection and row cache – but the Vertical pivot stage makes it easier as a specialised stage.

9. Z/OS File Stage

Makes it easier to process complex flat files by providing native support for mainframe files. Use it for VSAM files – KSDS, ESDS, RRDS. Sequential QSAM, BDAM, BSAM. Fixed and variable length records. Single or multiple record type files.

10. Balanced Optimizer Comes Home

In DataStage 8.5 the Balanced Optimizer has been merged into the Designer and it has a number of usability improvements that turns DataStage into a better ETLT or ELT option. Balanced Optimizer looks at a normal DataStage job and comes up with a version that pushes some of the steps down onto a source or target database engine. IE it balances the load across the ETL engine and the database engines.

Version 8.5 has improved logging, improved impact analysis support and easier management of optimised versions of jobs in terms of creating, deleting, renaming, moving, compiling and deploying them.

IBM DataStage 8.5 Newly Added Features :

DataStage 8.5 is out and IBM has made some significant improvements this time around. Let’s see some of the important enhancements in the new DataStage 8.5 version.

XML data

DataStage has historically been inefficient at handling XML files, but in 8.5 IBM has given us a great XML processing package. DataStage 8.5 can now process large XML files (over 30 GB) with ease. Also, we can now process XML data in parallel.

The new XML transform stage can data from multiple sources into a single XML output stream. If you think that is cool, it can also do it the other way around i.e., multiple XML input to a single output stream.

It can also convert data from one XML format to another.

Transformer Stage

It is one of the most used and the most important stages on DataStage and it just got better in 8.5

a. Transformer Looping:

Over the years DataStage programmers have been using workarounds to implement this concept. Now IBM has included it directly in the transformer stage.

There are two types of looping’s available

Output looping: Where we can output multiple output links for a single input link

Ex:

Input Record:

Salesman_name	City_1	City_2	City_3
DEVENDRA	New York	Madrid	New Delhi

Output Record:

Salesman_name	City
DEVENDRA	New York
DEVENDRA	Madrid
DEVENDRA	New Delhi

This is achieved using a new system variable @ITERATION

Input looping: We can now aggregate input records within the transformer and assign the aggregated data to the original input link while sending it to the output.

b. Transformer change detection:

SaveInputRecord() – Save a record to be used for later transformations within the job

GetInputRecord() – Retrieve the saved record as when it is required for comparisons

c. System Variables:

i. @ITERATION: Used in the looping mechanism

ii. LastRow(): Indicates the last row in the job

iii. LastRowInGroup(): Will return the last row in the group based on the key column

d. New NULL Handling features:

In DataStage 8.5 we need not explicitly handle NULL values. Record dropping is arrested if the target column is nullable. We need not handle NULL values explicitly when using functions over columns that have NULL values. And also stage variables are now nullable by default.

APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING has been prepared to support backward compatibility

e. New Data functions:

There are a host of new date functions incorporated into DataStage 8.5. I personally found the below function most useful

DataFromComponents(years, months, daysofmonth)

Ex: DataFromComponenets(2012,07,20) will output 2012-07-20

DataOffsetByComponents(basedate, years offset, month offset, daysofmonth offset)

Ex: DataOffsetByComponents(2012-07-20, 2,1,1) will output 2014-08-21

DataOffsetByComponents(2012-07-20, -4,0,0) will output 2008-07-20

I will write another detailed blog on the new data functions shortly

Parallel Debugger:

DataStage 8.5 now has a built in debugger functionality. We can now set breakpoints on the links in our jobs.

When the job is run in debug mode, it will stop when it encounters a breakpoint. From here we can step to the next action on that link or skip to the next row of data.

Refer Below Link :

http://datastageinfoguide.blogspot.in/2013/01/new-debug-feature-in-datastage-85.html

Functionality Enhancements:

- Mask encryption for before and after job subroutines

- Ability to copy permissions from one project to a new project

- Improvements in the multi-client manager

- New audit tracing and enhanced exception dialog

- Enhanced project creation failure details

Vertical Pivoting:

At long last vertical pivoting has been added

Integration with CVS

Now in DataStage 8.5 we have the feature that integrates directly with version control systems like CVS. We can now Check-in and Check-out directly from DataStage

Information Architecture Diagramming Tool:

Now solution architects can draw detailed integration solution plans for data warehouses from within DataStage

Balanced Optimizer:

As you all know DataStage is an ETL tool. But now with Balanced Optimizer directly being integrated we have the ELT (Extract Load and Transform) feature.

With this we can extract the data, load it and perform the transformations inside the database engine.

Its Fast!

DataStage 8.5 is considerably faster than its previous version (8.1). Tasks like saving, renaming, compiling are faster by nearly 40%. The run time performance of jobs has also improved.

The parallel engine

on DataStage has been tuned to improve performance and resource usage has reduced by 5% when compared to DataStage 8.1

Tabs

IBM DataStage 8.5 New Features

XML data

Transformer Stage

Parallel Debugger:

Functionality Enhancements:

Vertical Pivoting:

Integration with CVS

Information Architecture Diagramming Tool:

Balanced Optimizer:

Its Fast!

The parallel engine

No comments:

Post a Comment

disqus

Visitor's View Count

Translate This Blog

Professionals plz visit

.