Job Scheduling
The /Operations/<vo>/<setup>/JobScheduling section contains all parameters that define DIRAC’s behaviour when deciding what job has to be executed. Here’s a list of parameters that can be defined:
Parameter |
Description |
Default value |
---|---|---|
taskQueueCPUTimeIntervals |
Possible cpu time values that the task queues can have. |
360, 1800, 3600, 21600, 43200, 86400, 172800, 259200, 345600, 518400, 691200, 864000, 1080000 |
EnableSharesCorrection |
Enable automatic correction of the priorities assigned to each task queue based on previous history |
False |
CheckJobLimits |
Limit the amount of jobs running at sites based on their attributes |
False |
CheckMatchingDelay |
Delay running a job at a site if another job has started recently and the conditions are met |
False |
Before enabling the correction of priorities, take a look at Job Priority Handling. Priorities and how to correct them is explained there. The configuration of the corrections would be defined under JobScheduling/ShareCorrections.
Limiting the number of jobs running
Once JobScheduling/EnableJobLimits is enabled. DIRAC will check how many and what type of jobs are running at the configured sites. If there are more than a configured threshold, no more jobs of that type will run at that site. To define the limits create a JobScheduling/RunningLimit/<Site name> section for each site a limit has to be applied. Limits are defined by creating a section with the job attribute (like JobType) name, and setting the limits inside. For instance, to define that there can’t be more that 150 jobs running with JobType=MonteCarlo at site DIRAC.Somewhere.co set JobScheduling/RunningLimit/DIRAC.Somewhere.co/JobType/MonteCarlo=150
Setting the matching delay
DIRAC allows to throttle the amount of jobs that start at a given site. This throttling is defined under JobScheduling/MatchingDelay. It is configured similarly as the Limiting the number of jobs running. But instead of defining the maximum amount of jobs that can run at a site, the minimum seconds between starting jobs is defined. For instance JobScheduling/MatchingDelay/DIRAC.Somewhere.co/JobType/MonteCarlo=10 won’t allow jobs with JobType=MonteCarlo to start at site DIRAC.Somewhere.co with less than 10 seconds between them.
Example
An example with all the options under JobScheduling follows. Remember that JobScheduling is defined under /Operations/<vo>/<setup>/JobScheduling for multi-VO installations, and /Operations/<setup>/JobScheduling for single-VO ones:
JobScheduling
{
taskQueueCPUTimeIntervals = 360, 1800, 3600, 21600, 43200, 86400, 172800, 259200, 345600
EnableSharesCorrection = True
ShareCorrections
{
ShareCorrectorsToStart = WMSHistory
WMSHistory
{
GroupsInstance
{
MaxGlobalCorrectionFactor = 3
WeekSlice
{
TimeSpan = 604800
Weight = 80
MaxCorrection = 2
}
HourSlice
{
TimeSpan = 3600
Weight = 20
MaxCorrection = 5
}
}
UserGroupInstance
{
Group = dirac_user
MaxGlobalCorrectionFactor = 3
WeekSlice
{
TimeSpan = 604800
Weight = 80
MaxCorrection = 2
}
HourSlice
{
TimeSpan = 3600
Weight = 20
MaxCorrection = 5
}
}
}
}
CheckJobLimits = True
RunningLimit
{
DIRAC.Somewhere.co
{
JobType
{
MonteCarlo = 150
Test = 10
}
}
}
CheckMatchingDelay = True
MatchingDelay
{
DIRAC.Somewhere.co
{
JobType
{
MonteCarlo = 10
}
}
}
}
Transactional bulk job submission
When submitting parametric jobs (bulk submission), the job description contains a recipe to generate actual jobs per parameter value according to a formulae in the description. The jobs are generated by default synchronously in the call to the DIRAC WMS JobManager service. However, there is a risk that in case of an error jobs are partially generated without the client knowing it. To avoid this risk, an additional logic to ensure that no unwanted jobs are left in the system has been added together with DIRAC v6r20.