StalledJobAgent
The StalledJobAgent hunts for stalled jobs in the Job database. Jobs in “running” state not receiving a heart beat signal for more than stalledTime seconds will be assigned the “Stalled” state.
StalledJobAgent
{
StalledTimeHours = 2
FailedTimeHours = 6
PollingTime = 3600
MaxNumberOfThreads = 15
# List of sites for which we want to be more tolerant before declaring the job stalled
StalledJobsTolerantSites =
StalledJobsToleranceTime = 0
# List of sites for which we want to be Reschedule (instead of declaring Failed) the Stalled jobs
StalledJobsToRescheduleSites =
SubmittingTime = 300
MatchedTime = 7200
RescheduledTime = 600
Enable = True
}
- class DIRAC.WorkloadManagementSystem.Agent.StalledJobAgent.StalledJobAgent(*args, **kwargs)
Bases:
AgentModule
Agent for setting Running jobs Stalled, and Stalled jobs Failed.
And a few more.
- __init__(*args, **kwargs)
c’tor.
- am_Enabled()
- am_checkStopAgentFile()
- am_createStopAgentFile()
- am_getControlDirectory()
- am_getCyclesDone()
- am_getMaxCycles()
- am_getModuleParam(optionName)
- am_getOption(optionName, defaultValue=None)
Gets an option from the agent’s configuration section. The section will be a subsection of the /Systems section in the CS.
- am_getPollingTime()
- am_getShifterProxyLocation()
- am_getStopAgentFile()
- am_getWatchdogTime()
- am_getWorkDirectory()
- am_go()
- am_initialize(*initArgs)
Common initialization for all the agents.
This is executed every time an agent (re)starts. This is called by the AgentReactor, should not be overridden.
- am_removeStopAgentFile()
- am_secureCall(functor, args=(), name=False)
- am_setModuleParam(optionName, value)
- am_setOption(optionName, value)
- am_stopExecution()
- beginExecution()
- endExecution()
- execute()
The main agent execution method.
- finalize()
Graceful finalization.
- initialize()
Sets default parameters.