DEV'S DATASTAGE TUTORIAL,GUIDES,TRAINING AND ONLINE HELP 4 U. UNIX, ETL, DATABASE RELATED SOLUTIONS: Read multiple files using sequential file stage

How to read multiple files using single DataStage job? DK®

This article provides different ways of reading multiple files (having same or different metadata) using a single job.DK®

First find out if the metadata of files is same or different.

If files have same metadataDK®

Method 1 – Specific file names - In the Sequential stage, attach the metadata to sequential stage. In its properties, select Read method as 'Specific File(s)'. Then add all files by selecting 'file' property from the 'available properties to add'.
It will look like below -
File= /home/myFile1.txt
File= /home/myFile2.txt
File= /home/myFile3.txt
Read Method= Specific file(s) DK®

Method 2 – Using wild card(s) – In the above method, instead of giving individual file names, a pattern of file names can also be given. Use Read Method as ‘File Pattern’.
Then in the file pattern field, put any valid Unix command similar to below -
FileName_? (picking all files like FileName_1, FileName_2)

FileName_* (picking all files like FileName_1, FileName_12.txt, FileName_.txt)DK®

Method 3 – Using a valid shell expression (Bourne shell syntax) – If there are 5 files with similar pattern (as myFile*.txt) and only three files out of these five files needs to be read, then this method can be used. Use Read Method as ‘File Pattern’ and give a valid shell command like below in the File Pattern field – DK®

`ls /home/myFile*.txt | head -3`

Method 4 – Using Multiple Instance job – For this method, enable "Allow Multiple Instances" in the Job Properties. Add a job parameter in the sequential file stage where the input file needs to be defined.DK®
During execution, a value needs to be passed to the job parameter. Here pass the input file path placed in different directories. Execute the job for all the multiple files using the same job by passing the input file path for each run.
An invocation id needs to be provided for each run. This id indicates the no. of times the job has been executed. This can be observed in the job log.DK®

Method 5 – Another option is to have a command stage in job sequence which reads file name. And then pass the output of this command ($CommandOutput) to the file name parameter of sequential file stage.DK®

If the files have different metadata (structure is not same)

If the files have different metadata, then schema file option would have to be used. Schema file option is available in sequential file stage in the ‘Properties to add’ under Options menu. It provides the user an option to give the details of file metadata, its column structure and its file structure using a schema file.DK®

One just needs to make sure that the file and its schema should be in accordance to each other. Also make sure that the RCP (Runtime Column Propagation) property of the job is set as True (this will ensure that the column metadata is passed forward to other stages). DK®

Method 1 – Using parameters – Create Parallel job Sequential Stage (with schema file property active). Add three job Parameters - pFilePath, pFileName, pSchemaPath. In the Sequential Stage add pFilePath and pFileName to stage file property. Add stage Schema Name property, add pSchemaName to Schema property.

Then while running the job, give appropriate value of the three parameters.DK®

Method 2 – Using multi instance job – In the above method, just set the multi instance property of job as true and run the job for multiple sets of file and its schema.

Method 3 – Using loop in job sequence – Similarly, the above job can be used in a job sequence where it is run using a loop. Here each iteration of the loop will process one file.DK®

This can be made easier to run using a UserVariables Activity and assigning the list of files and schema files in the variables created there.DK®

Read multiple files using sequential file stage

No comments:

Post a Comment