This is a python grid job submission library (jess) with the following features:
- supports multiple backends including CREAM, ARC, HTCONDOR, etc.
- backends can have both native API and/or run via command line tools
- jdl and payload factories
- different job pooling/tracking strategies
- detailed debugging log per job if using command line backends
- provides nagios plugins to test basic functionality of CEs
Jess API offers the following interfaces that can be combined to develop different job management strategies and job workflows:
- set of static backends with generic API (submit, cancel, purge, log, status, output), which either directly submit the job via command tools or call a native API
- high-level job submission API (job management interface (jmi))
- support for direct submissions via shell command to CREAM (glite-ce-job-submit) and ARC (arcsub)
- support for job submissions via local or remote HTCondor pool to a number of different CEs (CREAM, ARC, Globus, HTCondor-CE, etc.) via shell commands (condor_submit, etc)
- jdl API supporting serialisation to CREAM_CE, ARC and HTCondor formats
- simple payload API that builds a tarball from a set of directories
- simple tracker that will submit and follow up on the status/execution of a single job per CE
Jess can be installed from the following yum repo(s):
http://linuxsoft.cern.ch/internal/repos/etf8s-qa/x86_64/os/
http://linuxsoft.cern.ch/internal/repos/etf7-qa/x86_64/os/
via: yum install python2-jess or python3-jess.
Additional packages needed for specific backends are as follows:
- HTCondor: condor condor-python (for remote pool), for local pool grid-universe condor service needs to be running on the host
- ARC: nordugrid-arc-plugins-globus
- CREAM-CE: glite-ce-cream-cli
Jess also contains a nagios plugin that uses the simple tracker to submit a job and follow its status for a given CE. In order to access CE's API a local proxy file needs to be specified that contains a valid x509 proxy (including any voms extensions if needed). Working directory hosting job data for each CE will be created under --work-dir. Resource to submit to is specified using the following format, e.g.
--resource <URI> URI specifying a CE where to send the job. Format :
<type>://<host>[:<port>]/<schedd>/<lrms-system-name>/<queue-name>
Type is one of cream, arc, condor, condor-ce or gt. Schedd is mandatory for condor and
lrms is mandatory for cream. Ports are optional and default to
arc:2811, cream:8443, gt:2119 and condor:9619
check_js nagios plugin synopsis
usage: check_js [-h] [--version] [-H HOSTNAME] [-w WARNING] [-c CRITICAL] [-d]
[-p PREFIX] [-s SUFFIX] [-t TIMEOUT] [-C COMMAND] [--dry-run]
[-o OUTPUT] [--namespace NAMESPACE] [-m METRIC] [-v VERBOSE]
-x PROXY [--executable EXECUTABLE] [--exec-args EXEC_ARGS]
[--job-schedule JOB_SCHEDULE] --vo VO [--vo-fqan VO_FQAN]
[--work-dir WORK_DIR] [--web-dir WEB_DIR] --backend BACKEND
[--pool POOL] [--schedd SCHEDD] [--resource RESOURCE]
[--jdl-ads JDL_ADS] [--ldap-uri LDAP_URI] [--zero-payload]
[--add-payload ADD_PAYLOAD] [--add-wntar-nag-nosam]
[--add-wntar-nag-nosamcfg] [--timeout-limits TIMEOUT_LIMITS]
[-e ENV_VAR] [--env-file ENV_FILE] [--arc-debug ARC_DEBUG]
[--arc-gmlog] [--arc-rsl ARC_RSL] [--arc-ce ARC_CE]
[--arc-sub-type ARC_SUB_TYPE] [--arc-info-type ARC_INFO_TYPE]
[--arc-registry ARC_REGISTRY] [--wnfm-config WNFM_CONFIG]
[--wnfm-static WNFM_STATIC] [--wnfm-pool WNFM_POOL]
[--wnfm-global-timeout WNFM_GLOBAL_TIMEOUT]
[--wnfm-test-timeout WNFM_TEST_TIMEOUT]
This plugin tests grid job submission with configurable payload.
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
-H HOSTNAME, --hostname HOSTNAME
Host name, IP Address, or unix socket (must be an
absolute path)
-w WARNING, --warning WARNING
Offset to result in warning status
-c CRITICAL, --critical CRITICAL
Offset to result in critical status
-d, --debug Specify debugging mode
-p PREFIX, --prefix PREFIX
Text to prepend to ever metric name
-s SUFFIX, --suffix SUFFIX
Text to append to every metric name
-t TIMEOUT, --timeout TIMEOUT
Global timeout for plugin execution
-C COMMAND, --command COMMAND
Nagios command pipe for submitting passive results
--dry-run Dry run, will not execute commands and submit passive
results
-o OUTPUT, --output OUTPUT
Plugin output format; valid options are nagios,
check_mk or passive (via command pipe); defaults to
nagios)
--namespace NAMESPACE
Metric prefix
-m METRIC, --metric METRIC
Name of a metric to be collected.
-v VERBOSE, --verbose VERBOSE
Verbosity.
-x PROXY, --proxy PROXY
VOMS Proxy used for submitting the job
-T TOKEN, --token TOKEN
SCITOKEN to use for submitting the job
--executable EXECUTABLE
Script/binary to execute on the WN
--exec-args EXEC_ARGS
Command line arguments to be passed on to the
executable
--job-schedule JOB_SCHEDULE
Interval (in minutes) to submit new job if previous
one has already finished
--vo VO Virtual Organization.
--vo-fqan VO_FQAN VOMS primary attribute as FQAN. If given, will be used
along with --vo
--work-dir WORK_DIR Working directory for storing job meta-data, logs,
output, etc.
--web-dir WEB_DIR Web directory for storing view of the job meta-data,
logs, output, etc.
--backend BACKEND Job submission backend to be used, options are
scondor, condor, cream, arc
--pool POOL Job submission to remote HTCondor master
--schedd SCHEDD Job submission to remote HTCondor schedd (needs
--pool)
--resource RESOURCE CE to send job to. Format :
<type>://<host>[:<port>]/<schedd>/<lrms-system-name
>/<queue-name> If not given - resource discovery via
BDII will be performed.
--jdl-ads JDL_ADS Classads to add to the JDL
--ldap-uri LDAP_URI Format [ldap://]hostname[:port[/]]
--zero-payload Generate zero bytes file as payload/tarball and pass
it as input
--add-payload ADD_PAYLOAD
Comma-separated list of top level directories with
Nagios compliant directories structure to be added to
tarball to be sent to WN.
--add-wntar-nag-nosam
Do not include standard SAM WN probes and their Nagios
config to WN tarball.
--add-wntar-nag-nosamcfg
Do not include Nagios configuration for SAM WN probes
to WN tarball. The probes themselves and respective
Python packages, however, will be included.
--timeout-limits TIMEOUT_LIMITS
Comma separated list of timeouts in minutes per job
status. Also support global timeout, e.g.
global:3600,idle:3000,running:15
-e ENV_VAR, --env ENV_VAR
Environment variable to set on the worker node
--env-file ENV_FILE Environment file to be transferred to the worker node
--arc-debug ARC_DEBUG
ARC backend: arcsub debug flag (defaults to INFO)
--arc-gmlog ARC backend: request gmlog
--arc-rsl ARC_RSL ARC backend: add-ons for nordugrid_rsl
--arc-ce ARC_CE ARC backend: arcsub computing element endpoint (arc6
client only)
--arc-sub-type ARC_SUB_TYPE
ARC backend: arcsub submission endpoint type (arc6
client only)
--arc-info-type ARC_INFO_TYPE
ARC backend: arcsub information endpoint type (arc6
client only
--arc-registry ARC_REGISTRY
ARC backend: arcsub registry (arc6 client only)
--wnfm-config WNFM_CONFIG
ETF WN qFM: configuration file (json)
--wnfm-static WNFM_STATIC
ETF WN qFM: Path to the statically compiled version of
ETF WN qFM
--wnfm-pool WNFM_POOL
ETF WN qFM: number of threads to run on WN (tests
concurrency)
--wnfm-global-timeout WNFM_GLOBAL_TIMEOUT
ETF WN qFM: global timeout (to run all tests, in
seconds)
--wnfm-test-timeout WNFM_TEST_TIMEOUT
ETF WN qFM: test timeout (to run a single test, in
seconds)
Sample executions of check_js (subsequent executions of the same command will follow up on the status of the job and report once it finishes):
Submitting to HTCondor-CE via direct submission using proxy:
/usr/lib64/nagios/plugins/check_js -H <CE> --zero-payload --executable <executable>
--vo-fqan /cms/Role=lcgadmin --work-dir <work-dir> -t 600 --vo <VO>
--resource condor-ce://<CE>/<CE_schedd>/nopbs/<queue or noqueue> -x <proxy_file>
--backend scondor --debug --pool <condor_pool> --schedd <condor_schedd>
Submitting to HTCondor-CE via direct submission using token:
/usr/lib64/nagios/plugins/check_js -H <CE> --zero-payload --executable <executable>
--vo-fqan /cms/Role=lcgadmin --work-dir <work-dir> -t 600 --vo <VO>
--resource condor-ce://<CE>/<CE_schedd>/nopbs/<queue or noqueue> -T <scitoken_file>
--backend scondor --debug --pool <condor_pool> --schedd <condor_schedd>
Both token and voms proxy can be supplied at the same time if needed.
Submitting to HTCondor-CE via local HTCondor pool:
# /usr/lib64/nagios/plugins/check_js -H <CE-hostname> --add-wntar-nag-nosamcfg --executable hello.sh
--vo-fqan <VO-FQAN> --work-dir /var/lib/gridprobes --prefix org.sam.CONDOR --suffix <VO-FQAN>
-t 600 --vo <VO> --resource htcondor://<CE-hostanme> -x <path_to_proxy_file> --backend scondor
Submitting to HTCondor-CE via remote HTCondor pool:
# /usr/lib64/nagios/plugins/check_js -H <CE-hostname> --add-wntar-nag-nosamcfg --executable hello.sh
--vo-fqan <VO-FQAN> --work-dir /var/lib/gridprobes --prefix org.sam.CONDOR --suffix <VO-FQAN>
-t 600 --vo <VO> --resource htcondor://<CE-hostanme> -x <path_to_proxy_file>
--backend scondor --pool <remote_pool> --schedd <remote_schedd>
Submitting to HTCondor-CE via remote HTCondor pool with additional payload:
# /usr/lib64/nagios/plugins/check_js -H <CE-hostname> --add-wntar-nag-nosamcfg --executable hello.sh
--vo-fqan <VO-FQAN> --work-dir /var/lib/gridprobes --prefix org.sam.CONDOR --suffix <VO-FQAN>
-t 600 --vo <VO> --resource htcondor://<CE-hostanme> -x <path_to_proxy_file> --backend scondor
--pool <remote_pool> --schedd <remote_schedd>
--add-wntar-nag <comma_sep_dirs_to_add_to_payload>
Submitting to HTCondor-CE via remote HTCondor pool with additional payload and custom classads:
# /usr/lib64/nagios/plugins/check_js -H <CE-hostname> --add-wntar-nag-nosamcfg --executable hello.sh
--vo-fqan <VO-FQAN> --work-dir /var/lib/gridprobes --prefix org.sam.CONDOR --suffix <VO-FQAN>
-t 600 --vo <VO> --resource htcondor://<CE-hostanme> -x <path_to_proxy_file> --backend scondor
--add-payload <comma_sep_paths_to_add_to_payload>
--jdl-ads '+DESIRED_Sites="<SITE>"_NL_+JOB_CMSSite="<SITE>"'
Testing ARC-CE via local HTCondor pool, using explicit queue information:
# /usr/lib64/nagios/plugins/check_js -H <ARC-CE> --add-wntar-nag-nosamcfg --executable hello.sh
--vo-fqan <VO-FQAN> --work-dir /var/lib/gridprobes --prefix org.sam.CONDOR --suffix <VO-FQAN>
-t 600 --vo <VO> --resource arc://<ARC-CE>/nosched/nopbs/<QUEUE> -x <path_to_proxy_file> --backend scondor
Testing ARC-CE via direct submission:
# /usr/lib64/nagios/plugins/check_js -H <ARC-CE> --add-wntar-nag-nosamcfg --executable hello.sh
--vo-fqan <VO-FQAN> --work-dir /var/lib/gridprobes --prefix org.sam.CONDOR --suffix <VO-FQAN>
-t 600 --vo <VO> --resource arc://<ARC-CE>/nosched/nopbs/grid -x <path_to_proxy_file> --backend arc
Testing CREAM-CE via direct submission with explicit batch system and queue:
# /usr/lib64/nagios/plugins/check_js -H <CREAM-CE> --add-wntar-nag-nosamcfg --executable hello.sh
--vo-fqan <VO-FQAN> --work-dir /var/lib/gridprobes --prefix org.sam.CONDOR --suffix <VO-FQAN>
-t 600 --vo <VO> --resource <CREAM-CE>/<QLRMS>/<QUEUE> -x <path_to_proxy_file> --backend cream