IBM Datastage
11.3.x Newly Added Features :
Please go through this post to
understand newly added features/stages in newly introduced IBM Datastage 11.3.x
version. This post contains datastage part only .Apart from that, There are so
many changes happened in information server suite level. I will try to add
those in my next posts.
Writing data to XML files : DK®
DataStage now can write data to Microsoft Excel files by using
the Unstructured Data stage. This is a great new feature especially when
clients/business users want data extracts in Excel format rather than CSV
files.
Hierarchical Data Stage (earlier XML stage): DK®
Now it can process not only XML files but all types of
Hierarchical Data (JSON). It can be used to design jobs that interact with REST
(Representational State Transfer) web services by using HTTP methods. For
example, you can design jobs that perform tasks such as posting message to
social networking sites, interacting with systems such as Microsoft Sharepoint,
or using maps and directions.
Data_Rules Stage ): DK®
Validate data with data rules
You can validate your data and ensure that the quality of your data conforms to
business expectations for data cleanliness by using the Data Rules stage, which
is provided by IBM InfoSphere Information Analyzer. For example, you check that
a supplier has a supplier ID in the correct format, a supplier name, and a tax
ID that is nine numeric characters. The data rules that are used in the Data
Rules stage are typically created by a data analyst by using InfoSphere
Information Analyzer.
Big Data File Stage : DK®
The Big Data File stage is used to read and write to files on
Hadoop (HDFS). The Big Data File stage is now compatible with Hortonworks 2.1,
Cloudera 4.5, and InfoSphere BigInsights 3.0.
Greenplum Connector Stage : DK®
This stage can be used to create a native connection for
accessing data located in Greenplum database. Table Definitions can also be imported
using the Greenplum Connector framework.
InfoSphere Master Data Management Connector
Stage : DK®
This can be used to read and write data from the IBM master data
management solution – InfoSphere MDM. This stage can be configured for Member
read and Member write interactions from the MDM server.
Amazon S3 Connector Stage : DK®
Amazon S3 (Simple Storage Service) is a cheap cloud file storage
system which offers availability through web services (REST, SOAP, and
BitTorrent). It offers scalability, high availability, and low latency at
extremely competitive prices. The Amazon S3 Connector stage be can used to read
and write data residing in Amazon S3.
Unstructured Data Stage – Microsoft Excel
(.xls and .xlsx) : DK®
The Unstructured Data stage was first introduced in DataStage
v9.1 and was used to read Excel files through a native interface. Previously,
Excel data was staged as a .csv file or accessed through ODBC. The stage can
also now be used to write data to Excel files.
Sort Stage Optimization : DK®
The Sort stage now tries to optimize your DataStage sort
operations by converting length bounded columns to variable length before the
sort and then converts it back to a length bounded column after the sort. When
a record’s actual size of data is smaller than the defined upper bound, the
optimization will result in reduced disk I/O.
Improved Flexibility in Record Delimiting : DK®
The Sequential File stage now gives developers more flexibility
with how a source flat file has to be delimited. A new environment variable,
APT_IMPORT_HANDLE_SHORT, can be set to enable the import operator the ability
the read in records which do not contain all of the fields defined in the
import schema. Previously, these records were rejected by the stage. The values
assigned to any missing field depend on the data type and nullability.
Operations Console/Workload Management : DK®
IBM lists the Operations Console and Workload Management as new
features of the 11.3 release documentation, even though these components have
already been introduced in previous releases. Both components are now part of
the base Information Server installation and Workload Management is now by
default enabled.
url option for commands : DK®
The InfoSphere DataStage CLI now includes a new option, -url,
for the logon clause of the dsjob and dsadmin commands. The option specifies a
full format URL for the domain to log on to.
Operations Console : DK®
If the capturing of monitoring data is enabled, the AppWatcher
process is automatically started when the engine tier computer is started.
Workload management: DK®
The workload management system is now enabled by default. IBM lists the Operations Console and
Workload Management as new features of the 11.3 release documentation, even
though these components have already been introduced in previous releases. Both
components are now part of the base Information Server installation and
Workload Management is now by default enabled.
Collection Courtesy
: Mohit Saini & Devendra
No comments:
Post a Comment