10.2.1. Workload Management System architecture

The WMS is a standard DIRAC system, and therefore it is composed by components in the following categories: Services, DBs, Agents, but also Executors.

10.2.1.1. Databases

JobDB

Main WMS database containing job definitions, status information and job parameters. It is used in most of the WMS components.

JobLoggingDB

Simple Job Logging Database.

PilotAgentsDB

Keep track of all the submitted grid pilot jobs. It also registers the mapping of the DIRAC jobs to the pilots.

SandboxMetadataDB

Keep the metadata of the sandboxes.

TaskQueueDB

The TaskQueueDB is used to organize jobs requirements into task queues, for easier matching.

All the DBs above are MySQL DBs, and should be installed using the system administrator console.

The JobDB MySQL table JobParameters can be replaced by an JobParameters backend built in Elastic/OpenSearch. To enable it, set the following flag:

/Operations/[Defaults | Setup]/Services/JobMonitoring/useESForJobParametersFlag=True

If you decide to make use of this Elastic/OpenSearch backend for storing job parameters, you would be in charge of setting the index policies, as Job Parameters stored in Elastic/OpenSearch are not deleted together with the jobs.

10.2.1.2. Services

JobManager

For submitting/rescheduling/killing/deleting jobs

JobMonitoring

For monitoring jobs

Matcher

For matching capabilities (of WNs) to requirements (of task queues –> so, of jobs)

JobStateUpdate

For storing updates on Jobs’ status

OptimizationMind

For Jobs scheduling optimization

SandboxStore

Frontend for storing and retrieving sandboxes

WMSAdministrator

For administering jobs and pilots

All these services are necessary for the WMS. Each of them should be installed using the system administrator console. You can have several instances of each of them running, with the exclusion of the Matcher and the OptimizationMind [TBC].

10.2.1.3. Agents

SiteDirector

send pilot jobs to Sites/CEs/Queues

JobCleaningAgent

clean old jobs from the system

PilotStatusAgent

update the status of the pilot jobs on the PilotAgentsDB

StalledJobAgent

hunt for stalled jobs in the Job database. Jobs in “running” state not receiving a heart beat signal for more than stalledTime seconds will be assigned the “Stalled” state.

All these agents are necessary for the WMS, and each of them should be installed using the system administrator console. You can duplicate some of these agents as long as you provide the correct configuration. A typical example is the SiteDirector, for which you may want to deploy even 1 for each of the sites managed.

Optional agents are:

StatesAccountingAgent or StatesMonitoringAgent

Use one or the other. StatesMonitoringAgent is used for producing Monitoring plots through the Monitoring System. (so, using ElasticSearch as backend), while StatesAccountingAgent does the same job but using the Accounting system (so, MySQL as backend).

A very different type of agent is the JobAgent, which is run by the pilot jobs and should NOT be run in a server installation.

10.2.1.4. Executors

Optimizers

optimize job submission and scheduling. The four executors that are run by default are: InputData, JobPath, JobSanity, JobScheduling. The Optimizers executor is a wrapper around all executors that are to be run. The executor modules it will run is given by the Load configuration option.

The Optimizers executor is necessary for the WMS. It should be installed using the system administrator console and it can also be duplicated.

To run additional executors inside the Optimizers executor change its Load parameter in the CS or during the installation with the system administrator console:

install executor WorkloadManagement Optimizers -p Load=JobPath,JobSanity,InputData,MyCustomExecutor,JobScheduling

For detailed information on each of these components, please do refer to the WMS Code Documentation.