Advanced Job Management

Parametric Jobs

A parametric job allows to submit a set of jobs in one submission command by specifying parameters for each job.

To define this parameter the attribute “Parameters” must be defined in the JDL, the values that it can take are:
  • A list (strings or numbers).

  • Or, an integer, in this case the attributes ParameterStart and ParameterStep must be defined as integers to create the list of job parameters.

Parametric Job - JDL

A simple example is to define the list of parameters using a list of values, this list can contain integers or strings::

Executable = "testJob.sh";
JobName = "%n_parametric";
Arguments = "%s";
Parameters = {"first","second","third","fourth","fifth"};
StdOutput = "StdOut_%s";
StdError = "StdErr_%s";
InputSandbox = {"testJob.sh"};
OutputSandbox = {"StdOut_%s","StdErr_%s"};

In this example, 5 jobs will be created corresponding to the Parameters list values. Note that other JDL attributes can contain “%s” placeholder. For each generated job this placeholder will be replaced by one of the values in the Parameters list.

In the next example, the JDL attribute values are used to create a list of 20 integers starting from 1 (ParameterStart) with a step 2 (ParameterStep)::

Executable = "testParametricJob.sh";
JobName = "Parametric_%n";
Arguments = "%s";
Parameters = 20;
ParameterStart = 1;
ParameterStep = 2;
StdOutput = "StdOut_%n";
StdError = "StdErr_%n";
InputSandbox = {"testParametericJob.sh"};
OutputSandbox = {"StdOut_%n","StdErr_%n"};

Therefore, with this JDL job description will be submitted in at once. As in the previous example, the “%s” placeholder will be replaced by one of the parameter values.

Parametric jobs are submitted as normal jobs, the command output will be a list of the generated job IDs, for example::

$ dirac-wms-job-submit Param.jdl
JobID = [1047, 1048, 1049, 1050, 1051]

These are standard DIRAC jobs. The jobs outputs can be retrieved as usual specifying the job IDs::

$ dirac-wms-job-get-output 1047 1048 1049 1050 1051

Creating and submitting parametric Jobs using DIRAC APIs

DIRAC APIs are an easy and convenient way to create and submit parametric jobs:

from DIRAC.Interfaces.API.Job import Job
from DIRAC.Interfaces.API.Dirac import Dirac
# or extensions, e.g. from LHCbDIRAC.Interfaces.API.LHCbJob import LHCbJob for LHCb

J = Job()
J.setCPUTime(17800)
J.setInputSandbox('exe-script.py') # whatever
J.setParameterSequence("args", ['one', 'two', 'three'])
J.setParameterSequence("iargs", [1, 2, 3])
J.setExecutable("exe-script.py", arguments=": testing %(args)s %(iargs)s", logFile='helloWorld_%n.log')
print Dirac().submitJob(J)

InputData (in the form of LFNs – Logical File Names) can become also parameters in parametric jobs:

inputDataList = [ # a list of lists
 [
     '/lhcb/data/data1',
     '/lhcb/data/data2'
 ],
 [
     '/lhcb/data/data3',
     '/lhcb/data/data4'
 ],
 [
     '/lhcb/data/data5',
     '/lhcb/data/data6'
 ]

J.setParameterSequence('InputData', inputDataList, addToWorkflow=True)

and similarly for InputSandbox:

inputSBList = [ # a list of lists
 [
     '/localFile.txt',
     '/another/localFile.py',
     '/some/lfn/some/where'
 ]

J.setParameterSequence('InputSandbox', inputSBList, addToWorkflow=True)

The list of parameters, whatever they are have to have ALL the same lenghth, e.g. there should not be a parameter of length 2 and another of length 3.

DIRAC API

The DIRAC API is encapsulated in several Python classes designed to be used easily by users to access a large fraction of the DIRAC functionality. Using the API classes it is easy to write small scripts or applications to manage user jobs and data.

Submitting jobs using APIs

  • First step, create a Python script specifying job requirements.

    Test-API.py:

    from DIRAC.Interfaces.API.Dirac import Dirac
    from DIRAC.Interfaces.API.Job import Job
    
    j = Job()
    j.setCPUTime(500)
    j.setExecutable('echo',arguments='hello')
    j.setExecutable('ls',arguments='-l')
    j.setExecutable('echo', arguments='hello again')
    j.setName('API')
    
    dirac = Dirac()
    result = dirac.submit(j)
    print 'Submission Result: ',result
    
  • Run the script:

    python Test-API.py
    
    $ python testAPI.py
    {'OK': True, 'Value': 196}
    

Retrieving Job Status

  • Create a script Status-API.py:

    from DIRAC.Interfaces.API.Dirac import Dirac
    from DIRAC.Interfaces.API.Job import Job
    import sys
    dirac = Dirac()
    jobid = sys.argv[1]
    print dirac.status(jobid)
    
  • Execute script:

    python Status-API.py <Job_ID>
    
    $python Status-API.py 196
    {'OK': True, 'Value': {196: {'Status': 'Done', 'MinorStatus': 'Execution Complete', 'Site': 'LCG.IRES.fr'}}}
    

Retrieving Job Output

  • Example Output-API.py:

    from DIRAC.Interfaces.API.Dirac import Dirac
    from DIRAC.Interfaces.API.Job import Job
    import sys
    dirac = Dirac()
    jobid = sys.argv[1]
    print dirac.getOutputSandbox(jobid)
    print dirac.getJobOutputData(jobid)
    
  • Execute script:

    python Output-API.py <Job_ID>
    
    $python Output-API.py 196
    

Local submission mode

The Local submission mode is a very useful tool to check the sanity of your job before submission to the Grid. The job executable is run locally in exactly the same way ( same input, same output ) as it will do on the Grid Worker Node. This allows to debug the job in a friendly local environment.

Let’s perform this exercise in the python shell.

  • Load python shell:

    bash-3.2$ python
    Python 2.5.5 (r255:77872, Mar 25 2010, 14:17:52)
    [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from DIRAC.Interfaces.API.Dirac import Dirac
    >>> from DIRAC.Interfaces.API.Job import Job
    >>> j = Job()
    >>> j.setExecutable('echo', arguments='hello')
    {'OK': True, 'Value': ''}
    >>> Dirac().submitJob(j,mode='local')
    2010-10-22 14:41:51 UTC /DiracAPI  INFO: <=====DIRAC v5r10-pre2=====>
    2010-10-22 14:41:51 UTC /DiracAPI  INFO: Executing workflow locally without WMS submission
    2010-10-22 14:41:51 UTC /DiracAPI  INFO: Executing at /afs/in2p3.fr/home/h/hamar/Tests/APIs/Local/Local_zbDHRe_JobDir
    2010-10-22 14:41:51 UTC /DiracAPI  INFO: Preparing environment for site DIRAC.Client.fr to execute job
    2010-10-22 14:41:51 UTC /DiracAPI  INFO: Attempting to submit job to local site: DIRAC.Client.fr
    2010-10-22 14:41:51 UTC /DiracAPI  INFO: Executing: /afs/in2p3.fr/home/h/hamar/DIRAC5/scripts/dirac-jobexec jobDescription.xml -o LogLevel=info
    Executing StepInstance RunScriptStep1 of type ScriptStep1 ['ScriptStep1']
    StepInstance creating module instance  ScriptStep1  of type Script
    2010-10-22 14:41:53 UTC dirac-jobexec.py/Script  INFO: Script Module Instance Name: CodeSegment
    2010-10-22 14:41:53 UTC dirac-jobexec.py/Script  INFO: Command is: /bin/echo hello
    2010-10-22 14:41:53 UTC dirac-jobexec.py/Script  INFO: /bin/echo hello execution completed with status 0
    2010-10-22 14:41:53 UTC dirac-jobexec.py/Script  INFO: Output written to Script1_CodeOutput.log, execution complete.
    2010-10-22 14:41:53 UTC /DiracAPI  INFO: Standard output written to std.out
    {'OK': True, 'Value': 'Execution completed successfully'}
    
  • Exit python shell

  • List the directory where you run the python shell, the outputs must be automatically created:

    bash-3.2$ ls
    Local_zbDHRe_JobDir  Script1_CodeOutput.log  std.err  std.out
    bash-3.2$ more Script1_CodeOutput.log
    <<<<<<<<<< echo hello Standard Output >>>>>>>>>>
    
    hello
    

Sending Multiple Jobs

  • Create a Test-API-Multiple.py script, for example:

    from DIRAC.Interfaces.API.Dirac import Dirac
    from DIRAC.Interfaces.API.Job import Job
    
    j = Job()
    j.setCPUTime(500)
    j.setExecutable('echo',arguments='hello')
    for i in range(5):
      j.setName('API_%d' % i)
      dirac = Dirac()
      jobID = dirac.submitJob(j)
      print 'Submission Result: ',jobID
    
  • Execute the script:

    $ python Test-API-Multiple.py
    Submission Result:  {'OK': True, 'Value': 176}
    Submission Result:  {'OK': True, 'Value': 177}
    Submission Result:  {'OK': True, 'Value': 178}
    

Using APIs to create JDL files.

  • Create a Test-API-JDL.py:

    from DIRAC.Interfaces.API.Job import Job
    j = Job()
    j.setName('APItoJDL')
    j.setOutputSandbox(['*.log','summary.data'])
    j.setInputData(['/vo.formation.idgrilles.fr/user/v/vhamar/test.txt','/vo.formation.idgrilles.fr/user/v/vhamar/test2.txt'])
    j.setOutputData(['/vo.formation.idgrilles.fr/user/v/vhamar/output1.data','/vo.formation.idgrilles.fr/user/v/vhamar/output2.data'],OutputPath='MyFirstAnalysis')
    j.setPlatform("")
    j.setCPUTime(21600)
    j.setDestination('LCG.IN2P3.fr')
    j.setBannedSites(['LCG.ABCD.fr','LCG.EFGH.fr'])
    j.setLogLevel('DEBUG')
    j.setExecutionEnv({'MYVARIABLE':'TEST'})
    j.setExecutable('echo',arguments='$MYVARIABLE')
    print j._toJDL()
    
  • Run the API:

    $ python Test-API-JDL.py
    
        Priority = "1";
        Executable = "dirac-jobexec";
        ExecutionEnvironment = "MYVARIABLE=TEST";
        StdError = "std.err";
        LogLevel = "DEBUG";
        BannedSites =
            {
                "LCG.ABCD.fr",
                "LCG.EFGH.fr"
            };
        StdOutput = "std.out";
        Site = "LCG.IN2P3.fr";
        Platform = "";
        OutputPath = "MyFirstAnalysis";
        InputSandbox = "jobDescription.xml";
        Arguments = "jobDescription.xml -o LogLevel=DEBUG";
        JobGroup = "vo.formation.idgrilles.fr";
        OutputSandbox =
            {
                "*.log",
                "summary.data",
                "Script1_CodeOutput.log",
                "std.err",
                "std.out"
            };
        CPUTime = "21600";
        JobName = "APItoJDL";
        InputData =
            {
                "LFN:/vo.formation.idgrilles.fr/user/v/vhamar/test.txt",
                "LFN:/vo.formation.idgrilles.fr/user/v/vhamar/test2.txt"
            };
        JobType = "User";
    

As you can see the parameters added to the job object are represented in the JDL job description. It can now be used together with the dirac-wms-job-submit command line tool.

Submitting MultiProcessor (MP) jobs

Jobs that can (or should) run using more than 1 processor should be described as such, using the “setNumberOfProcessors” method of the API:

j = Job()
j.setCPUTime(500)
j.setExecutable('echo',arguments='hello')
j.setExecutable('ls',arguments='-l')
j.setExecutable('echo', arguments='hello again')
j.setName('MP test')
j.setNumberOfProcessors(16)

Calling Job().setNumberOfProcessors(), with a value bigger than 1, will translate into adding also the “MultiProcessor” tag to the job description.

Added in version v6r20p5.

Users can specify in the job descriptions NumberOfProcessors and WholeNode parameters, e.g.:

NumberOfProcessors = 16;
WholeNode = True;

This will be translated internally into 16Processors and WholeNode tags. “MultiProcessor” tag is added automatically to the job description if more than 1 processor is specified.

This would allow resources (WN’s) to put flexibly requirements on jobs to be taken, for example, avoiding single-core jobs on a multi-core nodes.

Submitting jobs with specifc requirements (e.g. GPU)

<to expand, ~same as for MP jobs, i.e. use Tags>