Conductor Node,Section Leaders and Players

Details about Conductor Node, Section Leaders and Players Process in Datastage

Refer This Link as well For More Details Job Run Time Architecture


Jobs developed with DataStage Enterprise Edition (EE) are independent of the actual hardware and degree of parallelism used to run the job. The parallel Configuration File provides a mapping at runtime between the job and the actual runtime infrastructure and resources by defining logical processing nodes.
                       To facilitate scalability across the boundaries of a single server, and to maintain platform independence, the parallel framework uses a multi-process architecture.
                       The runtime architecture of the parallel framework uses a process-based architecture that enables scalability beyond server boundaries while avoiding platform-dependent threading calls. The actual runtime deployment for a given job design is composed of a hierarchical relationship of operating system processes, running on one or more physical servers


  • Section Leaders (one per logical processing node): used to create and manage player processes which perform the actual job execution. The Section Leaders also manage communication between the individual player processes and the master Conductor Node.
  • Players: one or more logical groups of processes used to execute the data flow logic. All players are created as groups on the same server as their managing Section Leader process.
  • Conductor Node (one per job): the main process used to startup jobs, determine resource assignments, and create Section Leader processes on one or more processing nodes. Acts as a single coordinator for status and error messages, manages orderly shutdown when processing completes or in the event of a fatal error. The conductor node is run from the primary server
        It is a main process to 
  1.  Start up jobs
  2.  Resource assignments
  3.  Responsible to create Section leader (used to create & manage player player process which perform actual job execution).
  4.  Single coordinator for status and error messages.
  5.  manages orderly shutdown when processing completes in the event of fatal error.
When the job is initiated the primary process (called the “conductor”) reads the job design, which is a generated Orchestrate shell (osh) script. The conductor also reads the parallel execution configuration file specified by the current setting of the APT_CONFIG_FILE environment variable.

Once the execution nodes are known (from the configuration file) the conductor causes a coordinating process called a “section leader” to be started on each; by forking a child process if the node is on the same machine as the conductor or by remote shell execution if the node is on a different machine from the conductor (things are a little more dynamic in a grid configuration, but essentially this is what happens). 


Communication between the conductor, section leaders and player processes in a parallel job is effected via TCP.

Senario's To Calculate the Processes :

Sample APT CONFIG FILE : See in bold to mention conductor node.

{
node "node1"
{
fastname "
DevServer1"pools "conductor"
resource disk "/datastage/Ascential/DataStage/Datasets/node1" {pools "
conductor"}
resource scratchdisk "/datastage/Ascential/DataStage/Scratch/node1" {pools ""}
}
node "node2"
{
fastname "
DevServer1"
pools ""
resource disk "/datastage/Ascential/DataStage/Datasets/node2" {pools ""}
resource scratchdisk "/datastage/Ascential/DataStage/Scratch/node2" {pools ""}
}
}

Please find the below different answers :

For every job that starts there will be one 

(1) conductor process (started on the conductor node), 
There will be one (1) section leader for each node in the configuration file and 
There will be one (1) player process (may or may not be true) 
for each stage in your job for each node. 

So if you have a job that uses a two (2) node configuration file and has 3 stages then your job will have

1 Conductor Node
2 Section leaders (2 Nodes * 1 Section leader per node)
6 Player processes (3 stages * 2 Nodes)
Your dump score may show that your job will run 9 processes on 2 nodes.

This kind of information is very helpful when determining the impact that a particular job or process will have on the underlying operating system and system resources.


No comments:

Post a Comment