How to read multiple files using single DataStage job? DK®
This
article provides different ways of reading multiple files (having same or
different metadata) using a single job.DK®
First find out if the metadata of files is same or different.
If files
have same metadataDK®
Method 1 – Specific file names - In the Sequential stage, attach the metadata to sequential stage. In its properties, select Read method as 'Specific File(s)'. Then add all files by selecting 'file' property from the 'available properties to add'.
It will look like below -
File= /home/myFile1.txt
File= /home/myFile2.txt
File= /home/myFile3.txt
Read Method= Specific file(s) DK®
Method 2 – Using wild card(s) – In the above method, instead of giving individual file names, a pattern of file names can also be given. Use Read Method as ‘File Pattern’.
Then in the file pattern field, put any valid Unix command similar to below -
FileName_? (picking all files like FileName_1, FileName_2)
FileName_* (picking all files like
FileName_1, FileName_12.txt, FileName_.txt)DK®
Method 3 – Using a valid shell expression
(Bourne shell syntax) – If there are 5 files with similar pattern (as myFile*.txt)
and only three files out of these five files needs to be read, then this method
can be used. Use Read Method as ‘File Pattern’ and give a valid shell command
like below in the File Pattern field – DK®
`ls /home/myFile*.txt
| head -3`
Method 4 – Using Multiple Instance job – For this method, enable "Allow Multiple Instances" in the Job Properties. Add a job parameter in the sequential file stage where the input file needs to be defined.DK®
During execution, a value needs to be passed to the job parameter. Here pass the input file path placed in different directories. Execute the job for all the multiple files using the same job by passing the input file path for each run.
An invocation id needs to be provided for each run. This id indicates the no. of times the job has been executed. This can be observed in the job log.DK®
Method 5 – Another option is to have a command stage in job sequence which reads file name. And then pass the output of this command ($CommandOutput) to the file name parameter of sequential file stage.DK®
If the files have different metadata (structure is not same)
If the files have different
metadata, then schema file option would have to be used. Schema file option is
available in sequential file stage in the ‘Properties to add’ under Options
menu. It provides the user an option to give the details of file metadata, its
column structure and its file structure using a schema file.DK®
One just
needs to make sure that the file and its schema should be in accordance to each
other. Also make sure that the RCP (Runtime Column Propagation) property of the
job is set as True (this will ensure that the column metadata is passed forward
to other stages). DK®
Method 1 – Using parameters – Create Parallel job Sequential Stage (with schema file property active). Add three job Parameters - pFilePath, pFileName, pSchemaPath. In the Sequential Stage add pFilePath and pFileName to stage file property. Add stage Schema Name property, add pSchemaName to Schema property.
Then while
running the job, give appropriate value of the three parameters.DK®
Method 2 – Using multi instance job – In the above method, just set the multi instance property of job as true and run the job for multiple sets of file and its schema.
Method 3 – Using loop in job sequence – Similarly, the above job can be used in a job sequence where it is run using a loop. Here each iteration of the loop will process one file.DK®
This can be
made easier to run using a UserVariables Activity and assigning the list of
files and schema files in the variables created there.DK®
No comments:
Post a Comment