Skip to content
Snippets Groups Projects
Forked from Corryvreckan / Corryvreckan
802 commits behind the upstream repository.
Peter Svihra's avatar
Peter Svihra authored
ff4aacc9
History

Jobsub

Overview

jobsub is a tool for the convenient run-specific modification of Corryvreckan configuration files and their execution through the corry executable. It can be used for automated processing of several data files and/or scans of reconstruction parameters. It is derived from the original jobsub written for EUTelescope @eutel-website by Hanno Perrey, Lund University.

It should be noted that when using jobsub on a local machine, the jobs are processed one by one. When running in batch mode, all jobs are submitted to HTCondor and processed in parallel.

Usage

The following help text is printed when invoking jobsub with the -h argument:

usage: jobsub.py [-h] [--option NAME=VALUE] [-c FILE] [-csv FILE]
                 [--log-file FILE] [-l LEVEL] [-s] [--dry-run]
                 [--batch FILE] [--subdir]
                 jobtask [runs [runs ...]]

A tool for the convenient run-specific modification of Corryvreckan steering files and their execution through the Marlin processor

positional arguments:
  runs                  The runs to be analyzed; can be a list of
                        single runs and/or a range, e.g. 1056-1060.

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -c FILE, --conf-file FILE, --config FILE
                        Configuration file with all Corryvreckan
                        algorithms defined
  --option NAME=VALUE, -o NAME=VALUE
                        Specify further options such as 'beamenergy=5.3'. This
                        switch be specified several times for multiple options
                        or can parse a comma-separated list of options. This
                        switch overrides any config file options and also
                        overwrites hard-coded settings on the Corryvreckan
                        configuration file.
  -htc FILE, --htcondor-file FILE, --batch FILE
                        Specify condor_submit parameter file for HTCondor submission. Run
                        HTCondor submission via condor_submit instead of calling
                        Corryvreckan directly
  -csv FILE, --csv-file FILE
                        Load additional run-specific variables from table
                        (text file in csv format)
  --log-file FILE       Save submission log to specified file
  -v LEVEL, --verbosity LEVEL
                        Sets the verbosity of log messages during job
                        submission where LEVEL is either debug, info, warning
                        or error
  -s, --silent          Suppress non-error (stdout) Corryvreckan output to
                        console
  --dry-run             Write configuration files but skip actual Corryvreckan
                        execution
  --subdir              Execute every job in its own subdirectory instead of
                        all in the base path
  --plain               Output written to stdout/stderr and log file in
                        prefix-less format i.e. without time stamping
  -j N, --cores N       Number of cores used for the local job submission, by default 1
  --zfill N             Fill run number with zeros up to the defined number of
                        digits

If running on lxplus, the environment variables need to be set by running source etc/setup_lxplus.sh. When using a submission file, getenv = True should be used as shown in _htcondor.sub.

The multicore job submission works best on lxplus or local machine (for the batch submission, only the actual submission not the execution will use multiple cores). It is important to note the by-default implemented parallelism of the alignment modules in corry. In case of multicore submission, one should restrict the number of alignment workers in the config file using the workers flag.

Preparation of Configuration File Templates

Configuration file templates are valid Corryvreckan configuration files in TOML format, where single values are replaced by variables in the form @SomeVariable@. A more detailed description of the configuration file format can be found elsewhere in the user manual. The section of a configuration file template with variable geometry file and DUT name could for instance look like

[Corryvreckan]
detectors_file = "@telescopeGeometry@"
histogram_file = "histograms_@RunNumber@.root"

number_of_events = 5000000

log_level = WARNING

When jobsub is executed, these placeholders are replaced with user-defined values that can be specified through command-line arguments or a table with a row for each run number processed, and a final configuration file is produced for each run separately, e.g.

[Corryvreckan]
detectors_file = "my_telescope_Nov2017_1.conf"
histogram_file = "histograms_run999.root"

number_of_events = 5000000

log_level = WARNING

There is only one predefined placeholder, @RunNumber@, which will be substituted with the current run number. Run numbers are not padded with leading zeros unless the --zfill option is provided.

Using Configuration Variables

As described in the previous paragraph, variables in the configuration file template are replaced with values at run time. Two sources of values are currently supported, and are described in the following.

Command Line

Variable substitutions can be specified using the --option or -o command line switches, e.g.

jobsub.py --option beamenergy=5.3 -c alignment.conf 1234

This switch can be specified several times for multiple options or can parse a comma-separated list of options. This switch overrides any config file options.

Table (comma-separated text file)

Tables in the form of CSV files can be used to replace placeholders with the -csv option. For the correct format, the following tools can be used:

  • export from Open/LibreOffice with default settings (UTF-8,comma-separated, text-field delimiter: ")
  • emacs org-mode table (see http://orgmode.org/manual/Tables.html)
  • use Atom's tablr extension or the CSV file can be edited in a text editor of choice.

The following rules apply:

  • Commented lines (starting with #) are ignored.
  • The first row (after comments) has to provide column headers which identify the variables in the steering template to replace (case-insensitive)
  • One column labeled "RunNumber" is required.
  • Only placeholders left in the steering template after processing command-line arguments and config file options are filled with values from the CSV file.

Strings can be passed by the user of double-quotes " " which also avoid the separation by commas. A double-quote can be used as part of a string when using the escape character backslash \ in front of the double-quote.

It is also possible to specify multiple different settings for the same run number by making use of the following syntax to specify a set or range of parameters. Curly brackets in double-quotes "{ }" can be used to indicate a set (indicated by a comma ,) or range (indicated by a dash -) of parameters which will be split up and processed one after the other. If a set or a range is detected, the parameter plus its value are attached to the name of the configuration file. Ranges can only be used for integer values (without units). However, a set or range can only be used for one parameter, i.e. multi-dimensional parameters scans are not supported and have to be separated into individual CSV files.

  • "{10,12-14}" translates to 10, 12, 13, 14 in consecutive jobs for the same run number
  • "{10ns, 20ns}" translates to 10ns, 20ns in consecutive jobs for the same run number
  • "string,with,comma" translates to string,with,comma in one job
  • "{string,with,comma}" which translates to string, with, comma in consecutive jobs for the same run number
  • "\"string in quotes\"" translates to "string in quotes"

The same expansion is also implemented for the run numbers, simplifying the manual creation of the file.

If a range or set of parameters is detector, the naming scheme of the auto-generated configuration files is extended from MyAnalysis_run@RunNUmber@.conf to MyAnalysis_run@RunNUmber@_OtherParameter@OtherParameter@.conf

It must be insured by the user that the output ROOT file is not simply called histograms_@RunNumber@.root but rather histograms_@RunNumber@_OtherParameter@OtherParameter@.root to prevent overwriting the output file.

Example

The CSV file could have the following form:

# AnalysisExample.csv
# This is an example.
RunNumber,  ExampleParameter,   AnotherParameter
100,        "{3-5}",            10ns
{101-103},  3,                  "{10ns, 20ns}"

Using this table, the placeholders @RunNUmber@, @ExampleParameter@, and @AnotherParameter@ in the template file AnalysisExample.conf would be replaced by the values corresponding to the current run number and the following configuration files would be generated:

AnalysisExample_run100_exampleparameter:3.conf
AnalysisExample_run100_exampleparameter:4.conf
AnalysisExample_run100_exampleparameter:5.conf
AnalysisExample_run101_anotherparameter:10ns.conf
AnalysisExample_run101_anotherparameter:20ns.conf
AnalysisExample_run102_anotherparameter:10ns.conf
AnalysisExample_run102_anotherparameter:20ns.conf
AnalysisExample_run103_anotherparameter:10ns.conf
AnalysisExample_run103_anotherparameter:20ns.conf

Example Usage with a Batch File:

Example command line usage:

./jobsub.py -c /path/to/example.conf -v DEBUG --batch /path/to/example.sub --subdir <run_number>

An example batch file is provided in the repository as _htcondor.sub, examples of the configuration files are _corry.conf and for the list of options in _options.csv. Complicated and error-prone transfer_output_files commands can be avoided. It is much simpler to set an absolute path like

output_directory = "/eos/user/y/yourname/whateveryouwant/run@RunNumber@"

directly in the Corryvreckan config file.