StalledJobAgent¶
- The StalledJobAgent hunts for stalled jobs in the Job database. Jobs in “running”
state not receiving a heart beat signal for more than stalledTime seconds will be assigned the “Stalled” state.
StalledJobAgent
{
StalledTimeHours = 2
FailedTimeHours = 6
PollingTime = 3600
MaxNumberOfThreads = 15
# List of sites for which we want to be more tolerant before declaring the job stalled
StalledJobsTolerantSites =
StalledJobsToleranceTime = 0
# List of sites for which we want to be Reschedule (instead of declaring Failed) the Stalled jobs
StalledJobsToRescheduleSites =
SubmittingTime = 300
MatchedTime = 7200
RescheduledTime = 600
Enable = True
}
- class DIRAC.WorkloadManagementSystem.Agent.StalledJobAgent.StalledJobAgent(*args, **kwargs)¶
Bases:
AgentModule
Agent for setting Running jobs Stalled, and Stalled jobs Failed. And a few more.
- __init__(*args, **kwargs)¶
c’tor
- am_Enabled()¶
- am_checkStopAgentFile()¶
- am_createStopAgentFile()¶
- am_disableMonitoring()¶
- am_getBasePath()¶
- am_getControlDirectory()¶
- am_getCyclesDone()¶
- am_getMaxCycles()¶
- am_getModuleParam(optionName)¶
- am_getOption(optionName, defaultValue=None)¶
Gets an option from the agent’s configuration section. The section will be a subsection of the /Systems section in the CS.
- am_getPollingTime()¶
- am_getShifterProxyLocation()¶
- am_getStopAgentFile()¶
- am_getWatchdogTime()¶
- am_getWorkDirectory()¶
- am_go()¶
- am_initialize(*initArgs)¶
Common initialization for all the agents.
This is executed every time an agent (re)starts. This is called by the AgentReactor, should not be overridden.
- am_monitoringEnabled()¶
- am_removeStopAgentFile()¶
- am_secureCall(functor, args=(), name=False)¶
- am_setModuleParam(optionName, value)¶
- am_setOption(optionName, value)¶
- am_stopExecution()¶
- beginExecution()¶
- endExecution()¶
- execute()¶
The main agent execution method
- finalize()¶
- initialize()¶
Sets default parameters