Datastage
11.5 New features for IIS common components
This
post will help you to understand the changes/enhancement made on IIS 11.5 for connectivity
services, administration services, and deployment services to support its data
integration and data governance capabilities. As part of IIS V11.5, the
following common platform enhancements have been made.
Engine tier execution within Apache
Hadoop
This
feature enables data integration, data cleansing, and data profiling and
analysis workloads to run on the data nodes of a Hadoop cluster. DK®
# The Information Server engine can be installed on
the edge node of a Hadoop cluster and the required binaries can be made
immediately available to all nodes of the cluster through three different
mechanisms: HDFS distribution, dynamic runtime copy, or NFS. Organizations can
choose from among these models to best fit their architecture and security
requirements. DK®
# A new Application Master component submits parallel
job run requests to YARN to secure resource allocation on the data nodes of the
cluster as appropriate to the job run requirements. Users can leverage features
such as Hadoop Node Labels to dedicate processing to specific nodes, and YARN
schedule queues to ensure appropriate workload management control. DK®
# The stages within
the job then operate on data partitions that are available for data on those
nodes (Data can be shipped to specific data nodes based on YARN's resource
selections). DK®
# The IIS engine
automatically uses in-memory features such as data pipelining, data
partitioning, and dynamic repartitioning to minimize job runtimes.
# Support for Kerberos-enabled clusters. DK®
Hadoop
version currency
# As with previous
releases, the product supports Apache Hadoop distributions that conform to the
Open Data Platform (ODP) requirements, such as IBM Open Platform, IBM
BigInsights, and HortonWorks, as well as other non-ODP distributions, such as
Cloudera. DK®
# Hadoop distribution
is supported for reading and writing of data from non-Hadoop IIS deployments as
well as Hadoop engine tier deployment: IBM Open Platform V4.x, IBM BigInsights
V4.x, HortonWorks V2.2, V2.3, and Cloudera V5.3, V5.4. DK®
Updated
Repository and Application Server options
# Repository Tier
supports Oracle 12g (including RAC) and SQL Server 2014.
# Services Tier
supports IBM WebSphere Application Server 8.5.5. DK®
Data
integration
Information
Server now includes the following additional features for data integration:
Data
integration for Hadoop
All
connectivity, transformation, and data delivery features are now able to
execute on the data nodes of the Hadoop cluster. DK®
Hadoop
file connectivity
# Expanded support for Hadoop files, including new
data formats, additional character sets, and additional data types. DK®
# Available on all supported operating systems,
including Microsoft™ Windows™, AIX®, Linux™ on System z®, and Linux (both
Hadoop and non-Hadoop). DK®
Embedded
protection of sensitivity data in-flight
Information
Server Enterprise Edition licensing now includes Optim data masking libraries
for protecting sensitive information in a DataStage® parallel job. DK®
Push-down
processing into relational database
# The DataStage Balanced Optimization features that
allow users to execute their data integration workloads within a relational
database by automatically rewriting the job design as SQL is now included in
the InfoSphere DataStage license and several products that include InfoSphere
DataStage as a supporting program. DK®
# Balanced Optimization has also been extended to
support push-down into an IDAA configuration on System Z. DK®
Data
quality
Information
Server now includes the following additional features for data quality:
Data
classification
# Support for data privacy, data masking and test data
management initiatives by identifying where Personally Identifiable Information
(PII), sensitive and other classes of data are stored. DK®
# Quick time to value by identifying the type of data
contained within a column using three dozen predefined, data classes including
credit card, taxpayer IDs, US phone number, and others.
# Extensible classifications enable you to create and
customize three types of data classes: valid values list, regular expression
(regex), and Java™ class. DK®
# Column analysis results provide suggested data
classification based on data values. Users can review these classifications and
select the data class that best represents the data stored in each column. DK®
Performance
enhancements
# Improved scalability and reduced resource
consumption during data profiling (column analysis). DK®
# A new option to limit the distinct values stored can
dramatically reduce the size of the results database (IADB). DK®
# Improved scalability for cross domain and foreign
key analysis. The new algorithm does not require a full column analysis with
capture of all distinct values. DK®
Certifications
# Data profiling, classification, investigation,
standardization, matching, survivorship, address verification, and monitoring
are now supported running directly inside a Hadoop cluster. DK®
# USAC and AVI address cleansing and validation is
also supported running on the Hadoop cluster. DK®
# Oracle Connector certification for IADB. DK®
InfoSphere
Information Governance Catalog
Information
Server now includes the following additional features for data governance:
Governance
Catalog Extensible Framework
Allow
customers to define and register new assets within the catalog, setting their
display and structural definitions, allowing users to recognize and understand
the information best. Further, these assets may be fully governed and mapped
for purposes of lineage.
Data
class definitions
#.
Allow for the detected Data Classifications to display and be searched within
Information Governance Catalog. DK®
# Allow
Administrators to create and manage Data Class definitions, including their
regular expressions, valid values or range. DK®
Multilanguage
support
Enables
business terms to be defined in different languages and associated to each
other to allow the organization to manage business definitions that are related. DK®
XML
Schema Definition support
Enables
you to import an XML Schema definition file, and browse, or share the included
entities, attributes, and types across the enterprise, and further, governing
and annotating the entities, attributes, and types. DK®
Hadoop
file metadata import
You
can now import metadata for Hadoop files. DK®
Column-level
lineage for Hadoop files
Job
lineage now includes column-level lineage for the file connector when used for
files in Hadoop or on other operating systems. DK®
Asset
Interchange support for Extend Lineage content
Enables
you to easily and seamlessly migrate all Extended Data Source and Extension
Mapping content from previous versions of Metadata Workbench or between
versions of Information Governance Catalog. This capability was not previously
available.DK®
No comments:
Post a Comment