Datastage Interview Related Questions and Answers V1.4
This post is very useful for datastage beginners..DK®
You will get Answers for below mentioned questions, in this Post:DK®
# What are the
main features of datastage?
# What
are Stage Variables, Derivations and Constants?
# Types
of views in Datastage Director?
# How
do you execute datastage job from command line prompt?
# Functionality
of Link Partitioner and Link Collector?
# What are the
types of jobs available in datastage?
# What are the
types of jobs available in datastage?
# What is the
difference between Server Job and Parallel Jobs?
# What is a project
# What is sequencer
# What is a container
# What are different types of
containers
# What are mainframe jobs
# What are parallel jobs
# How do you use procedure in
datastage job
# What is odbc stage
# What is hash file ? what are its
types?
# What type of hash file is to be
used in general in datastage jobs
# What is a stage variable?
# What are constraints and
derivations?
# How do you reject records in a
transformer?
# Why do you need stage variables?
# What is the precedence of stage
variables,derivations, and constraints?
# What are data elements?
# What are routines ?
# What are transforms and what is
the differenece between routines and
transforms
# What a datastage macro?
# What is job control ?
# How many types of stage.
# Define the difference between
active and Passive Stage
# What are the plugin stages used
in your projects.
# What is Plug in stage , How to
install plugin stages.
# What is Sort plugin.
# What is Aggregate stage.
# What is the hash file stage and
Sequential file stage.
# What types of flat files you
used.have you used tab delimited.
# What is the Job control code
-------------------------------------------------------------------------
DataStage has the following features to aid the design and
processing required to build a data warehouse :
- Uses
graphical design tools. With simple point and click techniques you can
draw a scheme to represent your processing requirements.
- Extracts
data from any number or types of database.
- Handles all
the metadata definitions required to define your data warehouse.
- You can view
and modify the table definitions at any point during the design of your
application.
- Aggregates
data.
- You can
modify SQL SELECT statements used to extract data.
- Transforms
data. DataStage has a set of predefined transforms and functions. you can
use to convert your data. You can easily extend the functionality by
defining your own transforms to use.
- Loads the
data warehouse.
A. Stage Variable - An intermediate processing variable that
retains value during read
and doesn’t pass the value into target column.
Derivation - Expression that specifies value to be passed on to
the target column.
Constant - Conditions that are either true or false that
specifies flow of data with a link.
There are 3 types of views in Datastage Director
a) Job View - Dates of Jobs Compiled.
b) Log View - Status of Job last run
c) Status View - Warning Messages, Event Messages, Program
Generated Messages.
A) Using "dsjob" command as follows.
dsjob -run -jobstatus projectname jobname
5.
Functionality of Link
Partitioner and Link Collector?
A)Link Partitioner: It actually splits data into various
partitions or data flows using
various partition methods.
Link Collector: It collects the data coming from partitions,
merges it into a single data
flow and loads to target.
- Server Job
- Parallel Job
- Sequencer
Job
- Container
Job
A. Server jobs were
doesn’t support the partitioning techniques but parallel jobs support the partition
techniques.
B. Server jobs are
not support SMTP,MPP but parallel supports SMTP,MPP.
C. Server jobs are
running in single node but parallel jobs are running in multiple nodes.
D. Server jobs
prefer while getting source data is low but data is huge then prefer the parallel
Datastage
Projects :- A Complete project may contain several jobs
and user-defined components .
Project Contain
Datastage jobs.
Built-in components .
These are predefined components used in job.
User-defined components
. These are customized components
created using the Datastage manager . each user-defined component performs a
specific task in a job .
All work done in
project. Projects are created during and after installation process.you can add
project on the Projects tab of Administrator.
A project is associated
with a directory.The Project directory is used by DataStage to store jobs and
other datastage objects and metedata.
Graphically create controlling job, without using the job
control function.
A group of stages and link in a job design is called
container.
There are two kinds of
Containers:
Local and Shared.
Local Containers only
exist within the single job they are used. Use Shared Containers to simplify
complex job designs.
Shared Containers exist
outside of any specific job. They are listed in the Shared Containers branch is
Manager. These Shared Containers can be added to any job. Shared containers are
frequently used to share a commonly used set of job components.
A Job Container contains
two unique stages. The Container Input stage is used to pass data into the
Container. The Container Output stage is used to pass data out of the
Container.
Local containers. These are created within a job
and are only accessible by that job. A local container is edited in a tabbed
page of the job’s Diagram window.
Shared containers. These are created separately
and are stored in the Repository in the same way that jobs are. Instances of a
shared container can be inserted into any server job in the project. Shared
containers are edited in their own Diagram window.
A Mainframe job is
complied and run on the mainframe , Data Extracted by such jobs is then loaded
into the datawarehouse .
These are compiled and
run on the DataStage server in a similar way to server jobs , but support
parallel processing on SMP,MPP and cluster systems
Use ODBC plug,pass one
dummy colomn and give procedure name in
SQL tab.
A Stage that extracts data from or loads data
into a database that implements the industry standard Open Database
Connectivity API. Used to represent a data source , an aggregation step , or
target data table ( Server Job Only )
Hash file is just like
indexed sequential file , this file internally indexed with a particular key
value . There are two type of hash file Static Hash File and Dynamic Hash File
.
Static Hash File .
In Datastage
transformation , we can define some variable and define Value from source .
We can create
constraints and derivations with datastage variable .
Through datastage
constraint we can reject record .
That is depend upon job requirement , through stage
variable we can file data.
stage
variables =>constraints=> derivations
A specification that
describes the type of data in a column and how the data is converted .
In Datastage routine is
just like function , which we call in datastage job . there are In-Built
routine and and also we can create routine .
Transforms is used to
manipulate data within datastage job .
In datastage macro can be used in expressions , job
control routines and before / after subroutines . The available macros are
concerned with ascertaining job status .
A job control routine provides the means of
controlling other jobs from the current job .
A set of one or more jobs can be validated, run ,reset , stopped ,
scheduled in much the same way as the current job can be .
There are three basic
type of stage
Built-in
stages :- Supplied with DataStage and used for
extracting , aggregating , transforming , or
writing data . All type of job have these stage .
Plug-in
stage :- Additional stages that can be installed in
DataStage to perform specialized tasks that the built-in stages do not support.
Server jobs and parallel jobs can make use of these .
Job
Sequence Stages :-
Special built-in stages which allow you to define sequences of activities to
run. Only job sequencer have these
There are two kinds of
stages:
Passive stages define
read and write access to data sources and repositories.
• Sequential
• ODBC
• Hashed
Active stages define how
data is filtered and transformed.
• Transformer
• Aggregator
• Sort plug-in
Plug-In
Stage :- A Stage that performd
specific processing that is not supported by the standard server job stage .
Used Plug-in – ORAOCI8, Orabulk.
Installation
:- Plug-in stages are added
to the Diagram window in the same way as built-in stages. To add a plug-in
stage:
A. Do
one of the following:
·
Choose the stage type from the Insert menu.
·
Click the appropriate stage button on the tool
palette.
B. Click
in the Diagram window where you want to position the stage. The stage appears
as a square in the Diagram window.
---- Some information about Plug-In Stage
You
may find that the built-in stage types do not meet all your requirements for
data extraction or transformation. In this case, you need to use a plug-in
stage. The function and properties of a plug-in stage are determined
by the plug-in used when the stage is inserted. Plug-ins are written to perform
specific tasks, for example, to bulk load data into a data mart. Two plug-ins are always installed with DataStage: BCPLoad and Orabulk. You can also choose to install a number of other plug-ins when you install DataStage.
You can write plug-ins. However, contact your local Ascential office before creating a plug-in, as there are a range of plug-ins already available that may meet your needs.
30.
Difference Between ORAOCI8 and Orabulk? DK®
ORAOCI8
:- This Stage allow to connect Oracle Database .
OraBulk
:- The Orabulk plug-in generates control and data files for bulk loading
into a single table on an Oracle target database. The files are suitable for
loading into the target database using the Oracle command sqlldr.
A mainframe processing
stage that sorts input columns
A stage type that
compute s totals or other functions of set of data .
A stage that extracts
data or load data into database that contain hashed file .
Sequential flat file
with comma separated .
Job control code used in
job control routine to creating controlling job, which invokes and run other jobs.
No comments:
Post a Comment