Datastage Interview Questions and Answers V1.1

Datastage Interview Related Questions and Answers V1.1

You will get Answers for below mentioned questions, in this blog:
A.What is DataStage parallel Extender (PE)/ Enterprise Edition (EE)?
B.What is a conductor node?
C.How do you execute datastage job from command line prompt? 
D.Difference between sequential file,dataset and fileset?

----------------
A.What is DataStage parallel Extender / Enterprise Edition (EE)?
Parallel extender is that the parallel processing of data extraction and transformation application . there are two types of parallel processing
1) Pipeline Parallelism
2) Partition Parallelism.

B.What is a conductor node?
Ans->Actually every process contains a conductor process where the execution was started and a section leader process for each processing node and a player process for each set of combined operators and a individual player process for each uncombined operator.
 
      When ever we want to kill a process we should have to destroy the player process and then section leader process and then conductor process.

C.How do you execute datastage job from command line prompt? 
Using "dsjob" command as follows. dsjob -run -jobstatus projectname jobname

ex:$dsjob -run  
and also the options like

 -stop -To stop the running job
 -lprojects - To list the projects
 -ljobs - To list the jobs in project
 -lstages - To list the stages present in job.
 -llinks - To list the links.
 -projectinfo - returns the project information(hostname and project name)
 -jobinfo - returns the job information(Job status,job runtime,endtime, etc.,)
 -stageinfo - returns the stage name ,stage type,input rows etc.,)
 -linkinfo - It returns the link information
 -lparams - To list the parameters in a job
 -paraminfo - returns the parameters info
 -log - add a text message to log.
 -logsum - To display the log
 -logdetail - To display with details like event_id,time,messge
 -lognewest - To display the newest log id.
 -report - display a report contains Generated time, start time,elapsed time,status etc.,
 -jobid - Job id information.



D.Difference between sequential file,dataset and fileset?
Sequential File: 
1. Extract/load from/to seq file max 2GB
2. when used as a source at the time of compilation it will be converted into native format from ASCII
3. Does not support null values
4. Seq file can only be accessed on one node.

Dataset:
1. It preserves partition.it stores data on the nodes so when you read from a dataset you dont have to repartition the data
2. it stores data in binary in the internal format of datastage. so it takes less time to read/write from ds to any other source/target.
3. You cannot view the data without datastage.
4. It Creates 2 types of file to storing the data.
    A) Descriptor File : Which is created in defined folder/path.
    B) Data File : Created in Dataset folder mentioned in configuration file.
5.  Dataset (.ds) file cannot be open directly, and you could follow alternative way to achieve that, Data Set Management, the utility in client tool(such as Designer and Manager), and command line ORCHADMIN.

Fileset:
1. It stores data in the format similar to that of sequential file.Only advantage of using fileset over seq file is it preserves partition scheme.
2. you can view the data but in the order defined in partitioning scheme.
3. Fileset creates .fs file and .fs file is stored as ASCII format, so you could directly open it to see the path of data file and its schema.



No comments:

Post a Comment