IBM Datastage 9.1 Newly Added features


New features and changes on IBM InfoSphere Information Server, Version 9.1

New features and changes were introduced for IBM InfoSphere Information Server, Version 9.1, along with documentation updates. The new and changed features and documentation updates are described in the following sections.

Table of contents
InfoSphere Information Server, Version 9.1 new features and changes:
  • Common capabilities across the InfoSphere Information Server suite
    • Administering
    • Connecting to external sources
    • InfoSphere Blueprint Director
    • InfoSphere Metadata Asset Manager
    • InfoSphere Metadata Workbench
    • Migrating
  • InfoSphere Information Server for Data Integration:
    • InfoSphere Data Click
    • InfoSphere DataStage
  • InfoSphere Information Server for Data Quality:
    • InfoSphere Data Quality Console
    • InfoSphere Information Analyzer
    • InfoSphere QualityStage
  • InfoSphere Business Information Exchange:
    • InfoSphere Business Glossary
    • InfoSphere Business Glossary Client for Eclipse
Documentation changes included in the Version 9.1 release:
  • Documentation introduced or enhanced with Version 9.1

Administering
New repository administration tool
The InfoSphere DataStage and QualityStage operations database and the InfoSphere QualityStage Standardization Rules Designer database are typically installed by the installation program unless you are using a database other than DB2 or unless you want to create them yourself. To assist in the management of repositories that are not installed by the installation program, the RepositoryAdmin command line tool is provided. You can also use the RepositoryAdmin tool for other purposes, such as to assist you in relocating a repository to another server or to update a connection to a repository. For more information, see RepositoryAdmin tool reference.
New database for InfoSphere QualityStage Standardization Rules Designer
The InfoSphere QualityStage Standardization Rules Designer is supported by an additional database for your Version 9.1 installation.
Connecting to external sources
Stage for IBM Operational Decision Management
IBM Operational Decision Management allows customers to externalize complex business rules from applications. With the new ILOG JRules stage, you can invoke complex business rules within the context of a job.
InfoSphere Streams connector
The new InfoSphere Streams connector enables integration between InfoSphere Streams and InfoSphere DataStage. You can use the InfoSphere Streams connector to send data from an InfoSphere DataStage job to an InfoSphere Streams job, and also to send data from an InfoSphere Streams job to an InfoSphere DataStage job.
Unstructured Data stage
Use the new Unstructured Data stage to extract information, such as formulas or document authors, from Microsoft Excel files. The stage supports style sheets for .xls and .xlsx file types.
Java™ Integration stage
You can use the new Java Integration stage to integrate your code into your job design by writing your Java code using the Java Integration stage API. The Java Integration stage API defines interfaces and classes for writing Java code that can be invoked from within InfoSphere DataStage and QualityStage parallel jobs.
Support for new data sources
The following connectors and stages are now available:
  • DB2 connector for IBM DB2 for Linux, UNIX, and Microsoft Windows, Version 10.1.x
  • DB2 connector for IBM DB2 for z/OS  , Version 10
  • MQ connector for IBM WebSphere MQ, Version 7.1.x and 7.5.x
  • Informix stage for IBM Informix, Version 11.7
  • Streams connector for IBM InfoSphere Streams 3.0
  • Teradata connector for Teradata Database 13.10 and 14.0
  • Oracle connector for Oracle Database 11g Release2
  • Sybase stage for Sybase ASE, Version 15.7 and Sybase IQ, Version 15.4
  • Netezza connector for Netezza 4.6x, 6.0.x, and 7.0.x
  • ODBC connector for DataDirect ODBC, Version 7.0.x
  • ILOG JRules stage for ILOG-JRules 7.1.x and WODM 8.0.x
  • Big Data File stage for IBM BigInsight 1.4 and Cloudera CH4.0

InfoSphere Blueprint Director
Publication of blueprints
Blueprints can now be published to the metadata repository of InfoSphere Information Server so that other users can view or use them. For more information, see Publishing a blueprint.
InfoSphere Metadata Asset Manager
Import metadata by bridge from additional tools
Import support was added for the following tools and types of metadata:
  • CA ERwin Data Modeler 8. Logical and physical data models.
  • IBM Cognos  , Version 10. Business intelligence (BI) models, BI reports, and related implemented data resources.
  • IBM InfoSphere Streams MetaBroker  . Endpoints and tuples.
  • Oracle BI Enterprise Edition. Business intelligence (BI) models, BI reports, and related data resources.
For more information, see Import bridges.
Export metadata
You can now use the OMG CWM 1 XMI 1 bridge to export the contents of databases and database schemas to XML files that are compliant with the OMG CWM XMI file format. For more information, see Exporting assets by using InfoSphere Metadata Manager.
Create and edit data connections
When you import by using a connector, you can now create a data connection, use an existing data connection, or edit an existing data connection. Data connections are saved to the metadata repository. For more information, see Data connections.
Automatic creation of metadata interchange servers
Metadata interchange servers that enable import from bridges and connectors are now created automatically during installation. For more information, see Metadata interchange servers.

InfoSphere Metadata Workbench
Enhancements in Manage Lineage utility
You can now select or clear InfoSphere DataStage projects to be included in lineage. Previously, the Manage Lineage utility included all jobs in a selected project. In addition, you can run the Manage Lineage utility on database views without selecting a InfoSphere DataStage project to link the database view to its source database table. For more information, see Manage Lineage services.
Integration with IBM InfoSphere Blueprint Director
You can browse, query, and display published blueprints. You can display the blueprint diagram. For more information, see Viewing blueprints.
Integration with IBM InfoSphere Information Analyzer
  • You can browse, query, and display published rule definitions and published rule set definitions.
  • You can browse, query, display, and include for lineage the InfoSphere DataStage Data Rules stage and its relationship to the published data rule.
Integration with Big Data platform
You can browse, query, display, and include for lineage the InfoSphere DataStage Unstructured Data, Big Data File, and Streams Connector stages.
Integration with IBM InfoSphere Business Glossary
  • You can browse, query, display, and assign assets to information governance rules and information governance policies.
  • You can query and display the new Is A and Has A term relationships. For more information, see Terms.
  • The complete category hierarchy of a term is displayed on the Details page for the term. In previous versions, only the parent category of the term was displayed.
Integration with IBM InfoSphere DataStage
  • You can browse, query, display, and include for lineage the InfoSphere DataStage Java Client and Java Transformer stages.
  • You can display additional database stage properties: the server, database, schema, and table properties of the stage.
  • You can display additional data file stage properties: the file and location properties of the stage.
Integration with IBM InfoSphere Data Click
You can browse, query, display, and include for lineage published Change Data Capture (CDC) subscriptions from InfoSphere Data Click. In addition, you can invoke the CDC subscription process from a blueprint diagram.
Importing assets into the metadata repository
You can generate database, data file, and business intelligence (BI) report assets from a CSV file for later import into the metadata repository. For more information, see Generating an ISX file to import database, data file, BI report, and BI model assets from the command line.

Migrating
New migration functions
To help you to migrate automatically, you can now use two new migration wizards. The wizards automate the process of exporting and importing databases, profiles, and directories that are associated with InfoSphere Information Server. The wizards collect information about your computer and InfoSphere Information Server configuration. The information is then used to export and import your system. The migration wizards support all three server tiers: the services tier, the engine tier, and the metadata repository tier. When you export or import by using the wizards, all tiers that are installed on the computer are backed up simultaneously. For more information, see Migrating to IBM InfoSphere Information Server, Version 9.1.

InfoSphere Data Click
InfoSphere Data Click helps users retrieve data and provision systems with agility. Users can offload individual tables or entire schemas to generate sandbox environments for personal or group development work. The simple interface enables users of any skill level to complete the data integration task.
InfoSphere Data Click inherits the built-in data governance features of the InfoSphere Information Server platform. InfoSphere Data Click generates both design and operational metadata to support data lineage and impact analysis. InfoSphere Data Click assets also support linkages to the business glossary so that users can establish trust in the sources of information that are used. Also, administrators can define policies that control the data integration activity so that users cannot exceed limits that are based on enterprise requirements.
InfoSphere Data Click is installed when you install InfoSphere Information Server for Data Integration. InfoSphere Data Click activities are governed from InfoSphere Blueprint Director. You install InfoSphere Data Click as a plug-in into InfoSphere Blueprint Director.
The following screen capture shows the summary of an offload request in InfoSphere Data Click:

InfoSphere DataStage
Workload management
You can now use the workload management service in InfoSphere Information Server to allow the administrator to set system resource policies and prioritization of workload classes. The policies and workload classes control the execution of parallel and server jobs. For more information, see Administering workload management.
Web-based job runtime management
Administration and management of the operational environment is simplified by extending the Operations Console. Authorized users can now define the workload management policies, and can run, stop, and reset integration jobs within the projects that they administer. For more information, see Overview of the Operations Console.
Balanced optimization for Hadoop
Extending the HDFS features in Version 8.7, you can now use the Balanced Optimization features of InfoSphere DataStage to push sets of data integration processing and related data I/O into a Hadoop cluster. InfoSphere DataStage adds integration with Oozie workflows, as well as real-time integration with InfoSphere Streams.
For more information about Balanced Optimization, see Introduction to InfoSphere DataStage Balanced Optimization. For more information about integration with Oozie workflows, see InfoSphere DataStage.
Support for IBM Rational Team Concert™ as a source control system
You can now use Rational Team Concert as a source control system in IBM InfoSphere Information Server Manager. For more information, see Source control of InfoSphere DataStage and QualityStage assets.
XML design and performance optimization enhancements
InfoSphere DataStage 9.1 includes new features to help you work with the type of large XML schemas that are often seen in industry standards. You can use one new feature, the schema view, to narrow the scope of a large XSD to only the subset of the schema tree that you want to work with. When you narrow the scope, you can focus on a particular business challenge and parse and compose XML documents more easily. Other new features include user-specified parallelization for greater performance, extended support for XSD typing, and usability and productivity improvements in XML job editing through schema search and mapping intelligence.

InfoSphere Data Quality Console
InfoSphere Data Quality Console is a new unified, browser-based interface that you can use to monitor and track data quality exceptions that are generated by InfoSphere Information Server products and components. Exceptions are entities that are generated by a condition or event and that might require additional information or investigation. For example, records that do not meet the conditions of data rules in InfoSphere Information Analyzer might be considered exceptions. The following screen capture shows how you can view a subset of exception descriptors by specifying search criteria, which include search terms and attributes.
For more information, see InfoSphere Data Quality Console.

InfoSphere Information Analyzer
Predefined rule definitions
A key challenge in assessing and monitoring information quality is starting the process to validate key business requirements. Instead of starting that process without assistance, you can start by using predefined data quality rule definitions. New installations of this release include more than 100 predefined rule definitions for basic and common domains. Also included are more than 60 predefined rule definitions that are designed to validate standardized address data. Although the rule definitions are optimized for US data, they can be modified for any country or region. For more information, see Accelerating data quality analysis by starting with predefined rule definitions.
The data domains that are represented include the following domains:
  • Personal identity, such as age, date of birth, and national identifier
  • Asset identity, such as IP address information
  • Financial
  • Orders and sales
  • Data classification, such as identifier, indicator, code, date, and quantity
  • Completeness, which checks whether a field exists
  • Data format, such as alphabetic and numeric
  • Address data
User-named output tables for data rules
When you create data rules, you can specify that you want a user-named rule output table to be created in addition to the system rule output tables. User-named output tables can be simple or advanced. Use a simple table if you plan to use the rule output from one rule to create subsequent rules. Use an advanced table if you want to collect rule output from multiple data rules into one table. Also, you might want to create an advanced user-named table if you plan to use the rule output from multiple rules to create subsequent rules. An advanced user-named table is an additional physical table with copied records, which means that it requires additional storage space. For more information, see Setting output tables for a data rule.
Distinct output records
You can now specify whether you want only distinct output records or all output records in the rule output table. For more information, see Setting the output content for a data rule.
Task sequencing
You can now use task sequences to group multiple InfoSphere Information Analyzer jobs that are to be executed sequentially. In this release, task sequencing is available only by using the HTTP API and CLI, and only rules, rule sets, and metrics are supported for task sequencing. For more information, see Task sequences.
InfoSphere QualityStage
Standardization Rules Designer
The new Standardization Rules Designer provides an intuitive and efficient framework that you can use to enhance standardization rule sets. You can use the browser-based interface to add or modify classifications, lookup tables, and rules. You can also import sample data to validate that the enhancements to the rule set work with your data. The following screen capture shows a part of the Standardization Rules Designer in which you can add or modify a rule by mapping input values from an example record to output columns. This rule splits concatenated values in an input address record by mapping each part of the input value to a different output column.

an address record and maps each value to an output column
For more information, see Enhancing standardization rule sets by using the Standardization Rules Designer.
New rule sets
The following rule sets are now available:
  • The PHPROD rule set is a rule set for pharmaceutical data. The rule set demonstrates how you can use rules to standardize description data from the health industry.
  • The RUNAMEL rule set can be used to standardize Russian names.
  • The RUADDRL rule set can be used to standardize Russian addresses and area information.
Rule set enhancements
The predefined rule sets are enhanced in the following ways:
  • The domain-specific rule sets can be used with the Standardization Rules Designer.
  • The CNNAME, HKCNAME, and HKNAME rule sets now have special options for name processing.
  • The CNADDR, CNAREA, CNPHONE, HKADDR, HKCADDR, and HKPHONE rule sets now have user modification subroutines.
  • The CNPHONE and HKPHONE rule sets are enhanced in several ways. For example, input data can be converted to half-width characters.
Sample data available for predefined jobs and tutorial
Sample data is now provided for the predefined standardization jobs that you can use to generate standardized data and the frequency information for that data. For more information, see Predefined standardization jobs.
The installation media also now contains sample data and other files that are required for the InfoSphere QualityStagetutorial. For more information, see the InfoSphere QualityStage parallel job tutorial.

InfoSphere Business Glossary
Expanded enterprise information governance with information governance policies and information governance rules
Now, in addition to creating and managing terms and categories, you can create and manage information governance policies and information governance rules. Information governance policies and rules describe the way that information should be used and managed to comply with business objectives. You can define relationships among the policies and rules and between the policies and rules and other metadata information assets. For more information, see Information governance policies and Information governance rules.
Advanced term relationships
You can use new relationships between terms to express hierarchies of type and containment. The relationships enable consumers of the information to understand the meaning of terminology more fully, in the context of other terms. For more information, see Is A and Has A relationships.
Single sign-on for Windows users
Integration with Windows desktop authentication enables users who are logged in to Windows to work with InfoSphere Business Glossary immediately, without requiring a separate login process. For more information, see Configuring Windows desktop single sign-on support.
Web-based access to blueprints
You can now define information about blueprints and view published blueprints directly from the business glossary. For more information, see Viewing blueprints.
Dynamic display of external content from OSLC providers
OSLC (Open Services for Lifecycle Collaboration) is a method of communicating among different systems. InfoSphere Business Glossary can now be a consumer of OSLC services from Rational Asset Manager and Rational Software Architect Data Manager. The metadata content that is stored in these OSLC providers is displayed dynamically in the business glossary. The dynamic display ensures that data is synchronized and eliminates the need for separate data transfer procedures. For more information, see Configuring cross-server communication for external assets.
Enhanced integration with InfoSphere Information Analyzer
In previous releases, you were able to view the results of table and column analysis, including valid values for columns. You can now browse, search, view details of, and assign published data rule definitions and data rule set definitions to business glossary assets. For more information, see Integration with other IBM InfoSphere products.

InfoSphere Business Glossary Client for Eclipse
Information governance policy and information governance rule assets
You can now browse, search, and display the properties of two new InfoSphere Business Glossary assets: information governance policies and information governance rules. You can assign an information governance rule to an asset, such as a database table, so that the information governance rule governs the asset.
Import and export of glossary assignments
Earlier versions supported import and export term assignments. In version 9.1, you can import and export glossary assignments, which include both term assignments and information governance rule assignments.
Advanced term relationships from InfoSphere Business Glossary
Two new term relationships, Is A and Has A, are included in the Properties view of a term. You can view the supertype and subtype relationship between terms in the Term Type Hierarchy view.
Business Process Modeling Notation (BPMN) model elements
You can now view and remove term assignments in BPMN model elements that are displayed in IBM Rational Software Architect. With the Business Process Model Integration API, you can build functions to add, remove, and get term assignments to BPMN model elements.
Local indexing
Local term assignments and local information governance rule assignments are now indexed to improve search and display performance.

Documentation introduced or enhanced with Version 9.1
Introduction to InfoSphere Information Server
This information is more complete and streamlined to help you understand how the suite and its components interact. Diagrams show where each component fits in the suite architecture, and scenarios explain how each component might be used to solve real business problems. For more information, see Introduction to InfoSphere Information Server.
InfoSphere Business Glossary
New topics provide information about populating your business glossary by using the command line:
  • Generating business glossary content from InfoSphere Data Architect glossary model (*.ndm) files
  • Generating business glossary content from logical data models
InfoSphere DataStage
  • The quality of information is improved and task steps are clarified in the InfoSphere DataStage tutorial. For more information, see Tutorial: Creating parallel jobs.
  • More troubleshooting information, with focus on client login and job runtime issues, is provided. The enhanced troubleshooting information includes information about specific operating systems and information about how to prevent errors. For more information, see Troubleshooting InfoSphere DataStage.

InfoSphere Metadata Asset Manager
Enhanced documentation of import and export bridges
  • Individual reference topics for each bridge contain prerequisites, frequently asked questions, troubleshooting information, and detailed help for each parameter. For more information, see Import bridges.
  • Individual PDF guides to using BI bridges contain customized information for imports from IBM Cognos, SAP BusinessObjects, Microsoft, and Oracle BIEE.
  • Mapping documents for each import bridge show how each metadata class in the source tool is displayed inInfoSphere Information Server.
Asset interchange and istool command line
The following functions are documented:
  • Exporting and importing InfoSphere Streams assets.
  • Exporting and importing InfoSphere Data Quality Console assets
  • Generating business glossary content from InfoSphere Data Architect glossary models
  • Generating business glossary content from logical data models
InfoSphere QualityStage
  • To help you learn about the new Standardization Rules Designer, tutorials are provided that use data from the product and address domains. For more information, see Tutorial: Enhancing a product rule set in the Standardization Rules Designer and Tutorial: Enhancing an address rule set in the Standardization Rules Designer.
  • New and updated topics provide information about the standardization process and standardization rule sets:
    • Standardization workflow
    • Developing rule sets
    • Enhancing standardization rule sets by using the Standardization Rules Designer


No comments:

Post a Comment