Skip to content
Snippets Groups Projects

This is a python grid job submission library (jess) with the following features:

  • supports multiple backends including CREAM, ARC, HTCONDOR, etc.
  • backends can have both native API and/or run via command line tools
  • jdl and payload factories
  • different job pooling/tracking strategies
  • detailed debugging log per job if using command line backends
  • provides nagios plugins to test basic functionality of CEs

Jess API offers the following interfaces that can be combined to develop different job management strategies and job workflows:

  • set of static backends with generic API (submit, cancel, purge, log, status, output), which either directly submit the job via command tools or call a native API
  • high-level job submission API (job management interface (jmi))
  • support for direct submissions via shell command to CREAM (glite-ce-job-submit) and ARC (arcsub)
  • support for job submissions via local or remote HTCondor pool to a number of different CEs (CREAM, ARC, Globus, HTCondor-CE, etc.) via shell commands (condor_submit, etc)
  • jdl API supporting serialisation to CREAM_CE, ARC and HTCondor formats
  • simple payload API that builds a tarball from a set of directories
  • simple tracker that will submit and follow up on the status/execution of a single job per CE

Jess can be installed from the following yum repo(s):

http://linuxsoft.cern.ch/internal/repos/etf8s-qa/x86_64/os/
http://linuxsoft.cern.ch/internal/repos/etf7-qa/x86_64/os/

via: yum install python2-jess or python3-jess.

Additional packages needed for specific backends are as follows:

  • HTCondor: condor condor-python (for remote pool), for local pool grid-universe condor service needs to be running on the host
  • ARC: nordugrid-arc-plugins-globus
  • CREAM-CE: glite-ce-cream-cli

Jess also contains a nagios plugin that uses the simple tracker to submit a job and follow its status for a given CE. In order to access CE's API a local proxy file needs to be specified that contains a valid x509 proxy (including any voms extensions if needed). Working directory hosting job data for each CE will be created under --work-dir. Resource to submit to is specified using the following format, e.g.

--resource <URI>   URI specifying a CE where to send the job. Format :
                   <type>://<host>[:<port>]/<schedd>/<lrms-system-name>/<queue-name>
                   Type is one of cream, arc, condor, condor-ce or gt. Schedd is mandatory for condor and
                   lrms is mandatory for cream. Ports are optional and default to 
                   arc:2811, cream:8443, gt:2119 and condor:9619

check_js nagios plugin synopsis

usage: check_js [-h] [--version] [-H HOSTNAME] [-w WARNING] [-c CRITICAL] [-d]
                [-p PREFIX] [-s SUFFIX] [-t TIMEOUT] [-C COMMAND] [--dry-run]
                [-o OUTPUT] [--namespace NAMESPACE] [-m METRIC] [-v VERBOSE]
                -x PROXY [--executable EXECUTABLE] [--exec-args EXEC_ARGS]
                [--job-schedule JOB_SCHEDULE] --vo VO [--vo-fqan VO_FQAN]
                [--work-dir WORK_DIR] [--web-dir WEB_DIR] --backend BACKEND
                [--pool POOL] [--schedd SCHEDD] [--resource RESOURCE]
                [--jdl-ads JDL_ADS] [--ldap-uri LDAP_URI] [--zero-payload]
                [--add-payload ADD_PAYLOAD] [--add-wntar-nag-nosam]
                [--add-wntar-nag-nosamcfg] [--timeout-limits TIMEOUT_LIMITS]
                [-e ENV_VAR] [--env-file ENV_FILE] [--arc-debug ARC_DEBUG]
                [--arc-gmlog] [--arc-rsl ARC_RSL] [--arc-ce ARC_CE]
                [--arc-sub-type ARC_SUB_TYPE] [--arc-info-type ARC_INFO_TYPE]
                [--arc-registry ARC_REGISTRY] [--wnfm-config WNFM_CONFIG]
                [--wnfm-static WNFM_STATIC] [--wnfm-pool WNFM_POOL]
                [--wnfm-global-timeout WNFM_GLOBAL_TIMEOUT]
                [--wnfm-test-timeout WNFM_TEST_TIMEOUT]

This plugin tests grid job submission with configurable payload.

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -H HOSTNAME, --hostname HOSTNAME
                        Host name, IP Address, or unix socket (must be an
                        absolute path)
  -w WARNING, --warning WARNING
                        Offset to result in warning status
  -c CRITICAL, --critical CRITICAL
                        Offset to result in critical status
  -d, --debug           Specify debugging mode
  -p PREFIX, --prefix PREFIX
                        Text to prepend to ever metric name
  -s SUFFIX, --suffix SUFFIX
                        Text to append to every metric name
  -t TIMEOUT, --timeout TIMEOUT
                        Global timeout for plugin execution
  -C COMMAND, --command COMMAND
                        Nagios command pipe for submitting passive results
  --dry-run             Dry run, will not execute commands and submit passive
                        results
  -o OUTPUT, --output OUTPUT
                        Plugin output format; valid options are nagios,
                        check_mk or passive (via command pipe); defaults to
                        nagios)
  --namespace NAMESPACE
                        Metric prefix
  -m METRIC, --metric METRIC
                        Name of a metric to be collected.
  -v VERBOSE, --verbose VERBOSE
                        Verbosity.
  -x PROXY, --proxy PROXY
                        VOMS Proxy used for submitting the job
  -T TOKEN, --token TOKEN
                        SCITOKEN to use for submitting the job                        
  --executable EXECUTABLE
                        Script/binary to execute on the WN
  --exec-args EXEC_ARGS
                        Command line arguments to be passed on to the
                        executable
  --job-schedule JOB_SCHEDULE
                        Interval (in minutes) to submit new job if previous
                        one has already finished
  --vo VO               Virtual Organization.
  --vo-fqan VO_FQAN     VOMS primary attribute as FQAN. If given, will be used
                        along with --vo
  --work-dir WORK_DIR   Working directory for storing job meta-data, logs,
                        output, etc.
  --web-dir WEB_DIR     Web directory for storing view of the job meta-data,
                        logs, output, etc.
  --backend BACKEND     Job submission backend to be used, options are
                        scondor, condor, cream, arc
  --pool POOL           Job submission to remote HTCondor master
  --schedd SCHEDD       Job submission to remote HTCondor schedd (needs
                        --pool)
  --resource RESOURCE   CE to send job to. Format :
                        <type>://<host>[:<port>]/<schedd>/<lrms-system-name
                        >/<queue-name> If not given - resource discovery via
                        BDII will be performed.
  --jdl-ads JDL_ADS     Classads to add to the JDL
  --ldap-uri LDAP_URI   Format [ldap://]hostname[:port[/]]
  --zero-payload        Generate zero bytes file as payload/tarball and pass
                        it as input
  --add-payload ADD_PAYLOAD
                        Comma-separated list of top level directories with
                        Nagios compliant directories structure to be added to
                        tarball to be sent to WN.
  --add-wntar-nag-nosam
                        Do not include standard SAM WN probes and their Nagios
                        config to WN tarball.
  --add-wntar-nag-nosamcfg
                        Do not include Nagios configuration for SAM WN probes
                        to WN tarball. The probes themselves and respective
                        Python packages, however, will be included.
  --timeout-limits TIMEOUT_LIMITS
                        Comma separated list of timeouts in minutes per job
                        status. Also support global timeout, e.g.
                        global:3600,idle:3000,running:15
  -e ENV_VAR, --env ENV_VAR
                        Environment variable to set on the worker node
  --env-file ENV_FILE   Environment file to be transferred to the worker node
  --arc-debug ARC_DEBUG
                        ARC backend: arcsub debug flag (defaults to INFO)
  --arc-gmlog           ARC backend: request gmlog
  --arc-rsl ARC_RSL     ARC backend: add-ons for nordugrid_rsl
  --arc-ce ARC_CE       ARC backend: arcsub computing element endpoint (arc6
                        client only)
  --arc-sub-type ARC_SUB_TYPE
                        ARC backend: arcsub submission endpoint type (arc6
                        client only)
  --arc-info-type ARC_INFO_TYPE
                        ARC backend: arcsub information endpoint type (arc6
                        client only
  --arc-registry ARC_REGISTRY
                        ARC backend: arcsub registry (arc6 client only)
  --wnfm-config WNFM_CONFIG
                        ETF WN qFM: configuration file (json)
  --wnfm-static WNFM_STATIC
                        ETF WN qFM: Path to the statically compiled version of
                        ETF WN qFM
  --wnfm-pool WNFM_POOL
                        ETF WN qFM: number of threads to run on WN (tests
                        concurrency)
  --wnfm-global-timeout WNFM_GLOBAL_TIMEOUT
                        ETF WN qFM: global timeout (to run all tests, in
                        seconds)
  --wnfm-test-timeout WNFM_TEST_TIMEOUT
                        ETF WN qFM: test timeout (to run a single test, in
                        seconds)

Sample executions of check_js (subsequent executions of the same command will follow up on the status of the job and report once it finishes):

Submitting to HTCondor-CE via direct submission using proxy:

/usr/lib64/nagios/plugins/check_js -H <CE> --zero-payload --executable <executable> 
--vo-fqan /cms/Role=lcgadmin --work-dir <work-dir>  -t 600 --vo <VO> 
--resource condor-ce://<CE>/<CE_schedd>/nopbs/<queue or noqueue> -x <proxy_file> 
--backend scondor --debug --pool <condor_pool> --schedd <condor_schedd>

Submitting to HTCondor-CE via direct submission using token:

/usr/lib64/nagios/plugins/check_js -H <CE> --zero-payload --executable <executable> 
--vo-fqan /cms/Role=lcgadmin --work-dir <work-dir>  -t 600 --vo <VO> 
--resource condor-ce://<CE>/<CE_schedd>/nopbs/<queue or noqueue> -T <scitoken_file> 
--backend scondor --debug --pool <condor_pool> --schedd <condor_schedd>

Both token and voms proxy can be supplied at the same time if needed.

Submitting to HTCondor-CE via local HTCondor pool:

# /usr/lib64/nagios/plugins/check_js -H <CE-hostname> --add-wntar-nag-nosamcfg --executable hello.sh 
--vo-fqan <VO-FQAN> --work-dir /var/lib/gridprobes --prefix org.sam.CONDOR --suffix <VO-FQAN> 
-t 600 --vo <VO> --resource htcondor://<CE-hostanme> -x <path_to_proxy_file> --backend scondor

Submitting to HTCondor-CE via remote HTCondor pool:

# /usr/lib64/nagios/plugins/check_js -H <CE-hostname> --add-wntar-nag-nosamcfg --executable hello.sh 
--vo-fqan <VO-FQAN> --work-dir /var/lib/gridprobes --prefix org.sam.CONDOR --suffix <VO-FQAN> 
-t 600 --vo <VO> --resource htcondor://<CE-hostanme> -x <path_to_proxy_file> 
--backend scondor --pool <remote_pool> --schedd <remote_schedd>

Submitting to HTCondor-CE via remote HTCondor pool with additional payload:

# /usr/lib64/nagios/plugins/check_js -H <CE-hostname> --add-wntar-nag-nosamcfg --executable hello.sh 
--vo-fqan <VO-FQAN> --work-dir /var/lib/gridprobes --prefix org.sam.CONDOR --suffix <VO-FQAN>
-t 600 --vo <VO> --resource htcondor://<CE-hostanme> -x <path_to_proxy_file> --backend scondor
--pool <remote_pool> --schedd <remote_schedd>
--add-wntar-nag <comma_sep_dirs_to_add_to_payload> 

Submitting to HTCondor-CE via remote HTCondor pool with additional payload and custom classads:

# /usr/lib64/nagios/plugins/check_js -H <CE-hostname> --add-wntar-nag-nosamcfg --executable hello.sh 
--vo-fqan <VO-FQAN> --work-dir /var/lib/gridprobes --prefix org.sam.CONDOR --suffix <VO-FQAN>
-t 600 --vo <VO> --resource htcondor://<CE-hostanme> -x <path_to_proxy_file> --backend scondor 
--add-payload <comma_sep_paths_to_add_to_payload> 
--jdl-ads '+DESIRED_Sites="<SITE>"_NL_+JOB_CMSSite="<SITE>"' 

Testing ARC-CE via local HTCondor pool, using explicit queue information:

# /usr/lib64/nagios/plugins/check_js -H <ARC-CE> --add-wntar-nag-nosamcfg --executable hello.sh 
--vo-fqan <VO-FQAN> --work-dir /var/lib/gridprobes --prefix org.sam.CONDOR --suffix <VO-FQAN> 
-t 600 --vo <VO> --resource arc://<ARC-CE>/nosched/nopbs/<QUEUE> -x <path_to_proxy_file> --backend scondor

Testing ARC-CE via direct submission:

# /usr/lib64/nagios/plugins/check_js -H <ARC-CE> --add-wntar-nag-nosamcfg --executable hello.sh 
--vo-fqan <VO-FQAN> --work-dir /var/lib/gridprobes --prefix org.sam.CONDOR --suffix <VO-FQAN>
-t 600 --vo <VO> --resource arc://<ARC-CE>/nosched/nopbs/grid -x <path_to_proxy_file> --backend arc

Testing CREAM-CE via direct submission with explicit batch system and queue:

# /usr/lib64/nagios/plugins/check_js -H <CREAM-CE> --add-wntar-nag-nosamcfg --executable hello.sh 
--vo-fqan <VO-FQAN> --work-dir /var/lib/gridprobes --prefix org.sam.CONDOR --suffix <VO-FQAN>
-t 600 --vo <VO> --resource <CREAM-CE>/<QLRMS>/<QUEUE> -x <path_to_proxy_file> --backend cream