DEV'S DATASTAGE TUTORIAL,GUIDES,TRAINING AND ONLINE HELP 4 U. UNIX, ETL, DATABASE RELATED SOLUTIONS: Datastage 11.5 Newly added features

Datastage 11.5 New features for IIS common components

This post will help you to understand the changes/enhancement made on IIS 11.5 for connectivity services, administration services, and deployment services to support its data integration and data governance capabilities. As part of IIS V11.5, the following common platform enhancements have been made.

Engine tier execution within Apache Hadoop

This feature enables data integration, data cleansing, and data profiling and analysis workloads to run on the data nodes of a Hadoop cluster.DK®

# The Information Server engine can be installed on the edge node of a Hadoop cluster and the required binaries can be made immediately available to all nodes of the cluster through three different mechanisms: HDFS distribution, dynamic runtime copy, or NFS. Organizations can choose from among these models to best fit their architecture and security requirements.DK®

# A new Application Master component submits parallel job run requests to YARN to secure resource allocation on the data nodes of the cluster as appropriate to the job run requirements. Users can leverage features such as Hadoop Node Labels to dedicate processing to specific nodes, and YARN schedule queues to ensure appropriate workload management control.DK®

# The stages within the job then operate on data partitions that are available for data on those nodes (Data can be shipped to specific data nodes based on YARN's resource selections).DK®

# The IIS engine automatically uses in-memory features such as data pipelining, data partitioning, and dynamic repartitioning to minimize job runtimes.

# Support for Kerberos-enabled clusters.DK®

Hadoop version currency

# As with previous releases, the product supports Apache Hadoop distributions that conform to the Open Data Platform (ODP) requirements, such as IBM Open Platform, IBM BigInsights, and HortonWorks, as well as other non-ODP distributions, such as Cloudera.DK®

# Hadoop distribution is supported for reading and writing of data from non-Hadoop IIS deployments as well as Hadoop engine tier deployment: IBM Open Platform V4.x, IBM BigInsights V4.x, HortonWorks V2.2, V2.3, and Cloudera V5.3, V5.4.DK®

Updated Repository and Application Server options

# Repository Tier supports Oracle 12g (including RAC) and SQL Server 2014.

# Services Tier supports IBM WebSphere Application Server 8.5.5.DK®

Data integration

Information Server now includes the following additional features for data integration:

Data integration for Hadoop

All connectivity, transformation, and data delivery features are now able to execute on the data nodes of the Hadoop cluster.DK®

Hadoop file connectivity

# Expanded support for Hadoop files, including new data formats, additional character sets, and additional data types.DK®

# Available on all supported operating systems, including Microsoft™ Windows™, AIX®, Linux™ on System z®, and Linux (both Hadoop and non-Hadoop).DK®

Embedded protection of sensitivity data in-flight

Information Server Enterprise Edition licensing now includes Optim data masking libraries for protecting sensitive information in a DataStage® parallel job.DK®

Push-down processing into relational database

# The DataStage Balanced Optimization features that allow users to execute their data integration workloads within a relational database by automatically rewriting the job design as SQL is now included in the InfoSphere DataStage license and several products that include InfoSphere DataStage as a supporting program.DK®

# Balanced Optimization has also been extended to support push-down into an IDAA configuration on System Z.DK®

Data quality

Information Server now includes the following additional features for data quality:

Data classification

# Support for data privacy, data masking and test data management initiatives by identifying where Personally Identifiable Information (PII), sensitive and other classes of data are stored.DK®

# Quick time to value by identifying the type of data contained within a column using three dozen predefined, data classes including credit card, taxpayer IDs, US phone number, and others.

# Extensible classifications enable you to create and customize three types of data classes: valid values list, regular expression (regex), and Java™ class.DK®

# Column analysis results provide suggested data classification based on data values. Users can review these classifications and select the data class that best represents the data stored in each column.DK®

Performance enhancements

# Improved scalability and reduced resource consumption during data profiling (column analysis).DK®

# A new option to limit the distinct values stored can dramatically reduce the size of the results database (IADB).DK®

# Improved scalability for cross domain and foreign key analysis. The new algorithm does not require a full column analysis with capture of all distinct values.DK®

Certifications

# Data profiling, classification, investigation, standardization, matching, survivorship, address verification, and monitoring are now supported running directly inside a Hadoop cluster.DK®

# USAC and AVI address cleansing and validation is also supported running on the Hadoop cluster.DK®

# Oracle Connector certification for IADB.DK®

InfoSphere Information Governance Catalog

Information Server now includes the following additional features for data governance:

Governance Catalog Extensible Framework

Allow customers to define and register new assets within the catalog, setting their display and structural definitions, allowing users to recognize and understand the information best. Further, these assets may be fully governed and mapped for purposes of lineage.

Data class definitions

#. Allow for the detected Data Classifications to display and be searched within Information Governance Catalog.DK®

# Allow Administrators to create and manage Data Class definitions, including their regular expressions, valid values or range.DK®

Multilanguage support

Enables business terms to be defined in different languages and associated to each other to allow the organization to manage business definitions that are related.DK®

XML Schema Definition support

Enables you to import an XML Schema definition file, and browse, or share the included entities, attributes, and types across the enterprise, and further, governing and annotating the entities, attributes, and types.DK®

Hadoop file metadata import

You can now import metadata for Hadoop files.DK®

Column-level lineage for Hadoop files

Job lineage now includes column-level lineage for the file connector when used for files in Hadoop or on other operating systems.DK®

Asset Interchange support for Extend Lineage content

Enables you to easily and seamlessly migrate all Extended Data Source and Extension Mapping content from previous versions of Metadata Workbench or between versions of Information Governance Catalog. This capability was not previously available.DK®

Datastage 11.5 Newly added features

No comments:

Post a Comment