JobCleaningAgent

The Job Cleaning Agent controls removing jobs from the WMS in the end of their life cycle.

This agent will take care of removing user jobs, while production jobs should be removed through the TransformationCleaningAgent.

JobCleaningAgent options
JobCleaningAgent
{
  PollingTime = 3600

  #Maximum number of jobs to be processed in one cycle
  MaxJobsAtOnce = 500

  # Maximum number of jobs to be processed in one cycle for HeartBeatLoggingInfo removal
  MaxHBJobsAtOnce = 0

  RemoveStatusDelay
  {
     # Number of days after which Done jobs are removed
     Done = 7
     # Number of days after which Killed jobs are removed
     Killed = 7
     # Number of days after which Failed jobs are removed
     Failed = 7
     # Number of days after which any jobs, irrespective of status is removed (-1 for disabling this feature)
     Any = -1
  }

  RemoveStatusDelayHB
  {
     # Number of days after which HeartBeatLoggingInfo for Done jobs are removed, positive to enable
     Done = -1
     # Number of days after which HeartBeatLoggingInfo for Killed jobs are removed
     Killed = -1
     # Number of days after which HeartBeatLoggingInfo for Failed jobs are removed
     Failed = -1
  }

  # Which production type jobs _not_ to remove, takes default from Operations/Transformations/DataProcessing
  ProductionTypes =
}

Cleaning HeartBeatLoggingInfo

If the HeartBeatLoggingInfo table of the JobDB is too large, the information for finished jobs can be removed (including for transformation related jobs). In vanilla DIRAC the HeartBeatLoggingInfo is only used by the StalledJobAgent. For this purpose the options MaxHBJobsAtOnce and RemoveStatusDelayHB/[Done|Killed|Failed] should be set to values larger than 0.

class DIRAC.WorkloadManagementSystem.Agent.JobCleaningAgent.JobCleaningAgent(*args, **kwargs)

Bases: DIRAC.Core.Base.AgentModule.AgentModule

Agent for removing jobs in status “Deleted”, and not only

__init__(*args, **kwargs)

c’tor

am_Enabled()
am_checkStopAgentFile()
am_createStopAgentFile()
am_disableMonitoring()
am_getBasePath()
am_getControlDirectory()
am_getCyclesDone()
am_getMaxCycles()
am_getModuleParam(optionName)
am_getOption(optionName, defaultValue=None)

Gets an option from the agent’s configuration section. The section will be a subsection of the /Systems section in the CS.

am_getPollingTime()
am_getShifterProxyLocation()
am_getStopAgentFile()
am_getWatchdogTime()
am_getWorkDirectory()
am_go()
am_initialize(*initArgs)

Common initialization for all the agents.

This is executed every time an agent (re)starts. This is called by the AgentReactor, should not be overridden.

am_monitoringEnabled()
am_removeStopAgentFile()
am_secureCall(functor, args=(), name=False)
am_setModuleParam(optionName, value)
am_setOption(optionName, value)
am_stopExecution()
beginExecution()
deleteJobOversizedSandbox(jobIDList)

Deletes the job oversized sandbox files from storage elements. Creates a request in RMS if not immediately possible.

Parameters

jobIDList (list) – list of job IDs

Returns

S_OK/S_ERROR

deleteJobsByStatus(condDict, delay=False)

Sets the job status to “DELETED” for jobs in condDict.

Parameters
  • condDict (dict) – a dict like {‘JobType’: ‘User’, ‘Status’: ‘Killed’}

  • delay (int) – days of delay

Returns

S_OK/S_ERROR

endExecution()
execute()

Remove or delete jobs in various status

finalize()
initialize()

Sets defaults

removeDeletedJobs(delay=False)

Fully remove jobs that are already in status “DELETED”, unless there are still requests.

Parameters

delay (int) – days of delay

Returns

S_OK/S_ERROR

removeHeartBeatLoggingInfo(status, delayDays)

Remove HeartBeatLoggingInfo for jobs with given status after given number of days.

Parameters
  • status (str) – Job Status

  • delayDays (int) – number of days after which information is removed

Returns

None