BEST Datastage PERFORMANCE MONITORING methodsDK®
This Post will help you to undestand best permormance monitoring methods. Using this methods you can tune your datastage jobs.DK®
Usage of Job Monitor:DK®
The IBM InfoSphere DataStage job monitor can be accessed through the IBM InfoSphereDataStage Director. The job monitor provides a useful snapshot of a job’s performance at a certain moment of its execution, but does not provide thorough performance metrics. Due to buffering and to some job semantics, a snapshot image of the flow might not be a representative sample of the performance over the course of the entire job. The CPU summary information provided by the job monitor is useful as a first approximation of where time is being spent in the flow. That is why a job monitor snapshot should not be used in place of a full run of the job, or a run with a sample set of data as it does not include information on sorts or similar components that might be inserted automatically by the engine in a parallel job. For these components, the score dump can be of assistance.
Usage of Score Dump:DK®
In order to resolve any performance issues it is essential to have an understanding of the data flow within the jobs. To help understand a job flow, a score dump should be taken. This can be done by setting the APT_DUMP_SCORE environment variable to “true” prior to running the job.DK®
When enabled, the score dump produces a report which shows the operators, processes and data sets in the job and contains information about :DK®
- Where and how data was repartitioned.
- Whether IBM InfoSphere DataStage has inserted extra operators in the flow.
- The degree of parallelism each operator has run with, and on which nodes.
- Where data was buffered.
The score dump information is included in the job log when a job is run and is particularly useful in showing where IBM InfoSphere DataStage is inserting additional components/actions in the job flow, in particular extra data partitioning and sorting operators as they can both be detrimental to performance. A score dump will help to detect superfluous operators and amend the job design to remove them.
Usage of Resource Estimation:DK®
Predicting hardware resources needed to run DataStage jobs in order to meet processing time requirements can sometimes be more of an art than a science.
With new sophisticated analytical information and deep understanding of the parallel framework, IBM has added Resource Estimation to DataStage (and QualityStage) 8.x. This can be used to determine the needed system requirements or to analyze if the current infrastructure can support the jobs that have been created.
Within a job design, a new toolbar option is available called Resource Estimation.
This option opens a dialog called Resource Estimation. The Resource Estimation is based on a modelization of the job. There are two types of models that can be created:
- Static. The static model does not actually run the job to create the model. CPU utilization cannot be estimated, but disk space can. The record size is always fixed. The “best case” scenario is considered when the input data is propagated. The “worst case” scenario is considered when computing record size.
- Dynamic. The Resource Estimation tool actually runs the job with a sample of the data. Both CPU and disk space are estimated. This is a more predictable way to produce estimates.
Resource Estimation is used to project the resources required to execute the job based on varying data volumes for each input data source.
A projection is then executed using the model selected. The results show the total CPU needed, disk space requirements, scratch space requirements, and other relevant information.
Different projections can be run with different data volumes and each can be saved. Graphical charts are also available for analysis, which allow the user to drill into each stage and each partition. A report can be generated or printed with the estimations.
This feature will greatly assist developers in estimating the time and machine resources needed for job execution. This kind of analysis can help when analyzing the performance of a job, but IBM DataStage also offers another possibility to analyze job performance.
Usage of Performance Analysis:DK®
Isolating job performance bottlenecks during a job execution or even seeing what else was being performed on the machine during the job run can be extremely difficult. IBM Infosphere DataStage 8.x adds a new capability called Performance Analysis.
It is enabled through a job property on the execution tab which collects data at job execution time. ( Note: by default, this option is disabled ) . Once enabled and with a job open, a new toolbar option, called Performance Analysis, is made available .
This option opens a new dialog called Performance Analysis. The first screen asks the user which job instance to perform the analysis on.
Detailed charts are then available for that specific job run including:
- Job timeline
- Record Throughput
- CPU Utilization
- Job Timing
- Job Memory Utilization
- Physical Machine Utilization (shows what else is happening overall on the machine, not just the DataStage activity).
Each partition’s information is available in different tabs.
A report can be generated for each chart.
Using the information in these charts, a developer can for instance pinpoint performance bottlenecks and re-design the job to improve performance.
In addition to instance performance, overall machine statistics are available. When a job is running, information about the machine is also collected and is available in the Performance Analysis tool including:
- Overall CPU Utilization
- Memory Utilization
- Disk Utilization
Developers can also correlate statistics between the machine information and the job performance. Filtering capabilities exist to only display specific stages.
The information collected and shown in the Performance Analysis tool can easily be analyzed to identify possible bottlenecks. These bottlenecks are usually situated in the general job design, which will be described in the following chapter.
No comments:
Post a Comment