Commit e708e6dc authored by Simon Spannagel's avatar Simon Spannagel
Browse files

Merge branch 'jobsub_improved_naming' into 'master'

jobsub: improved auto-naming of config files

Closes #57

See merge request !182
parents 8955e5a5 242f3c3e
Pipeline #1159820 failed with stages
in 6 minutes and 3 seconds
......@@ -58,8 +58,8 @@ optional arguments:
digits
```
The environment variables need to be set using ```source etc/setup_lxplus.sh```.
When using a submission file, `getenv = True` should be used (see example.sub).
For usage on `lxplus`, the environment variables need to be set by running ```source etc/setup_lxplus.sh```.
When using a submission file, `getenv = True` should be used (as in `example.sub`).
### Preparation of Configuration File Templates
......@@ -91,11 +91,6 @@ log_level = WARNING
There is only one predefined placeholder, `@RunNumber@`, which will be substituted with the current run number. Run numbers are not padded with leading zeros unless the `--zfill` option is provided.
To avoid confusion with the text-field delimiter (see below), "paired" parameters such as `spatialCut = 100um, 150um` should be provided separately in the configuration template such as
```toml
spatialCut = @spatialCutX@, @spatialCutY@
```
### Using Configuration Variables
As described in the previous paragraph, variables in the configuration file template are replaced with values at run time.
......@@ -111,32 +106,56 @@ jobsub.py --option beamenergy=5.3 -c alignment.conf 1234
This switch can be specified several times for multiple options or can parse a comma-separated list of options. This switch overrides any config file options.
#### Table (comma-separated text file)
- format: e.g.
- export from Open/LibreOffice with default settings (UTF-8,comma-separated, text-field delimiter: ")
- emacs org-mode table (see http://orgmode.org/manual/Tables.html)
- use Atom's *tablr* extension
- commented lines (starting with #) are ignored
- first row (after comments) has to provide column headers which identify the variables in the steering template to replace (case-insensitive)
- requires one column labeled "RunNumber"
- only considers placeholders left in the steering template after processing command-line arguments and config file options
It is also possible to specify multiple different settings for the same run number in different lines.
If so, it should be ensured that the output file is not called `histograms_@RunNumber@` but rather `histograms_@RunNumber@_@OtherParameter@` to prevent overwriting the output file.
Tables in the form of CSV files can be used to replace placeholders with the `-csv` option.
For the correct format, the following tools can be used:
- export from Open/LibreOffice with default settings (UTF-8,comma-separated, text-field delimiter: ")
- emacs org-mode table (see http://orgmode.org/manual/Tables.html)
- use Atom's *tablr* extension
or the CSV file can be edited in a text editor of choice.
The following rules apply:
- Commented lines (starting with `#`) are ignored.
- The first row (after comments) has to provide column headers which identify the variables in the steering template to replace (case-insensitive)
- One column labeled "RunNumber" is required.
- Only placeholders left in the steering template __after__ processing command-line arguments and config file options are filled with values from the CSV file.
Strings can be passed by the user of double-quotes `" "` which also avoid the separation by commas.
A double-quote can be used as part of a string when using the escape character backslash `\` in front of the double-quote.
It is also possible to specify multiple different settings for the same run number by making use of the following syntax to specify a set or range of parameters.
Curly brackets in double-quotes `"{ }"` can be used to indicate a set (indicated by a comma `,`) or range (indicated by a dash `-`) of parameters which will be split up and processed one after the other.
If a set or a range is detected, the parameter plus its value are attached to the name of the configuration file.
Ranges can only be used for integer values (without units).
However, a set or range can oly be used for one parameter, i.e. multi-dimensional parameters scans are not supported and have to be separated into individual CSV files.
* `"{10,12-14}"` translates to `10`, `12`, `13`, `14` in consecutive jobs for the same run number
* `"{10ns, 20ns}"` tranlates to `10ns`, `20ns` in consecutive jobs for the same run number
* `"string,with,comma"` translates to `string,with,comma` in one job
* `"{string,with,comma}"` which translates to `string`, `with`, `comma` in consecutive jobs for the same run number
* `"\"string in quotes\""` translates to `"string in quotes"`
If a range or set of parameters is detector, the naming scheme of the auto-generated configuration files is extended from `MyAnalysis_run@RunNUmber@.conf` to `MyAnalysis_run@RunNUmber@_OtherParameter@OtherParameter@.conf`
It must be insured by the user that the output ROOT file is not simply called `histograms_@RunNumber@.root` but rather `histograms_@RunNumber@_OtherParameter@OtherParameter@.root` to prevent overwriting the output file.
##### Example
The CSV file could have the following form:
```csv
RunNumber, BeamEnergy, telescopeGeometry
4115, 1, telescope_june2017_1.conf
4116, 2, telescope_june2017_1.conf
4117, 3, telescope_june2017_1.conf
4118, 4, telescope_june2017_1.conf
4119, 5, telescope_june2017_1.conf
# AnalysisExample.csv
# This is an example.
RunNumber, ExampleParameter, AnotherParameter
100, "{3-5}", 10ns
101, 3, "{10ns, 20ns}"
```
Using this table, the placeholders `@RunNUmber@`, `@ExampleParameter@`, and `@AnotherParameter@` in the template file `AnalysisExample.conf` would be replaced by the values corresponding to the current run number and the following configuration files would be generated:
```
AnalysisExample_run100_exampleparameter3.conf
AnalysisExample_run100_exampleparameter4.conf
AnalysisExample_run100_exampleparameter5.conf
AnalysisExample_run101_anotherparameter10ns.conf
AnalysisExample_run101_anotherparameter20ns.conf
```
Using this table, the variables `@BeamEnergy@` and `@telescopeGeometry@` in the templates would be replaced by the values corresponding to the current run number.
### Example Usage with a Batch File:
Example command line usage:
......
......@@ -15,7 +15,7 @@ to see the list of command line options.
import sys
import logging
def parseIntegerString(nputstr=""):
def parseIntegerString(inputstr=""):
"""
return a list of selected values when a string in the form:
1-4,6
......@@ -24,25 +24,34 @@ def parseIntegerString(nputstr=""):
as expected...
(from http://thoughtsbyclayg.blogspot.de/2008/10/parsing-list-of-numbers-in-python.html)
Modified such that it returns a list of strings
if the conversion to integer fails, e.g.
"10ns, 20ns"
would return:
"10ns", "20ns"
"""
selection = list()
# tokens are comma seperated values
tokens = [substring.strip() for substring in nputstr.split(',')]
tokens = [substring.strip() for substring in inputstr.split(',')]
for i in tokens:
try:
# typically tokens are plain old integers
selection.append(int(i))
except ValueError:
# if not, then it might be a range
token = [int(k.strip()) for k in i.split('-')]
if len(token) > 1:
token.sort()
# we have items seperated by a dash
# try to build a valid range
first = token[0]
last = token[len(token)-1]
for value in range(first, last+1):
selection.append(value)
try:
# if not, then it might be a range
token = [int(k.strip()) for k in i.split('-')]
if len(token) > 1:
token.sort()
# we have items seperated by a dash
# try to build a valid range
first = token[0]
last = token[len(token)-1]
for value in range(first, last+1):
selection.append(value)
except ValueError:
# if not treat as string, not integer
selection.append(i)
return selection # end parseIntegerString
def ireplace(old, new, text):
......@@ -119,6 +128,7 @@ def loadparamsfromcsv(csvfilename, runs):
except StopIteration:
log.debug("End of csv file reached, sample limited to " + str(len(sample))+ " bytes")
dialect = csv.Sniffer().sniff(sample) # test csv file format details
dialect.escapechar = "\\"
log.debug("Determined the CSV dialect as follows: delimiter=%s, doublequote=%s, escapechar=%s, lineterminator=%s, quotechar=%s , quoting=%s, skipinitialspace=%s", dialect.delimiter, dialect.doublequote, dialect.escapechar, list(ord(c) for c in dialect.lineterminator), dialect.quotechar, dialect.quoting, dialect.skipinitialspace)
filteredfile.rewind() # back to beginning of file
reader = csv.DictReader(filteredfile, dialect=dialect) # now process CSV file contents here and load them into memory
......@@ -155,7 +165,7 @@ def loadparamsfromcsv(csvfilename, runs):
log.warn("Could not interpret run number on line "+str(filteredfile.linecount)+" in file '"+csvfilename+"'.")
continue
if len(missingRuns)==0:
log.debug("Found eat least one line for each run we were searching for.")
log.debug("Found at least one line for each run we were searching for.")
log.debug("Searched over "+str(filteredfile.linecount)+" lines in file '"+csvfilename+"'.")
if not len(missingRuns)==0:
......@@ -537,96 +547,143 @@ def main(argv=None):
log.info("Will now start processing the following runs: "+', '.join(map(str, runs)))
# now loop over all runs
for run in runs:
run_iterator = 0 # counts how many times one run occurs in the config file with different configurations
if keepRunning['Sigint'] == 'seen':
log.critical("Stopping to process remaining runs now")
break # if we received ctrl-c (SIGINT) we stop processing here
if args.zfill:
runnr = str(run).zfill(args.zfill)
else:
runnr = str(run)
log.info ("Now generating configuration file for run number "+runnr+"..")
# When running in subdirectories for every job, create it:
if args.subdir:
basedirectory = "run_"+runnr
if not os.path.exists(basedirectory):
os.makedirs(basedirectory)
# Decend into subdirectory:
savedPath = os.getcwd()
os.chdir(basedirectory)
if parameters_csv:
for line in parameters_csv: # go through line by line
# make a copy of the preprocessed steering file content
steeringString = steeringStringBase
# if we have a csv file we can parse, we will check for the runnumber and replace any
# variables identified by the csv header by the run specific value
try:
if parameters_csv[line]["runnumber"] != runnr:
n_repeat = 0 # counts how many times one run occurs in the config file with different configurations
i_repeat = 0 # repeat until i_repeat = n_repeat
while True: # break when not repeating the same run again
if keepRunning['Sigint'] == 'seen':
log.critical("Stopping to process remaining runs now")
break # if we received ctrl-c (SIGINT) we stop processing here
if args.zfill:
runnr = str(run).zfill(args.zfill)
else:
runnr = str(run)
log.info ("Now generating configuration file for run number "+runnr+"..")
# When running in subdirectories for every job, create it:
if args.subdir:
basedirectory = "run_"+runnr
if not os.path.exists(basedirectory):
os.makedirs(basedirectory)
# Decend into subdirectory:
savedPath = os.getcwd()
os.chdir(basedirectory)
if parameters_csv:
for line in parameters_csv: # go through line by line
# make a copy of the preprocessed steering file content
steeringString = steeringStringBase
# if we have a csv file we can parse, we will check for the runnumber and replace any
# variables identified by the csv header by the run specific value
try:
if parameters_csv[line]["runnumber"] != runnr:
continue
appendix = '' # empty string if one run is analysed only once
for field in parameters_csv[line].keys():
# prepare empty list in case of a set or range of parameters like {10,12}
current_parameter = list()
log.debug("Next parameter: %s", parameters_csv[line][field])
log.debug("parameters_csv[line][field][0] = %s", parameters_csv[line][field][0])
# remove all whitespaces from beginning and end of string (not in the middle)
parameters_csv[line][field] = parameters_csv[line][field].strip()
if parameters_csv[line][field][0] == '{':
log.debug("Found open bracket, look for matching close bracket.")
if parameters_csv[line][field][-1] == '}':
log.debug("Found matching close bracket, Interpret as range or set of parameters.")
# remove curly brackets:
parameter_field = parameters_csv[line][field].strip("{}")
# Check if csv field contains "," or "-", i.e. a set or range of values
# If not, no conversion is required (or even possible in case of file paths etc.)
# If yes, call parseIntegerString() and create multiple configuration files.
if any(delimiter in parameter_field for delimiter in [',','-']):
current_parameter = parseIntegerString(parameter_field)
n_repeat = len(current_parameter)
log.debug("Found delimiter for '%s'", field)
else:
# current_parameter needs to be a list to get len(list) = 1
current_parameter.append(parameters_csv[line][field])
log.debug("No delimiter found for '%s'", field)
else:
log.error("No matching close bracket found. Please update CSV file.")
exit(1)
else:
log.debug("No bracket found, interpret as one string.")
current_parameter.append(parameters_csv[line][field])
log.debug("current_parameter has length %d", len(current_parameter))
# check if we actually find all parameters from the csv file in the steering file - warn if not
log.debug("Parsing steering file for csv field name '%s'", field)
try:
# check that the field name is not empty and do not yet replace the runnumber
if not field == "":
if len(current_parameter) == 1:
steeringString = ireplace("@" + field + "@", parameters_csv[line][field], steeringString)
else:
log.debug("list index, n_repeat = '%d', i_repeat = '%d'", n_repeat, i_repeat)
steeringString = ireplace("@" + field + "@", str(current_parameter[i_repeat]), steeringString)
appendix = appendix + '_' +field + str(current_parameter[i_repeat])
i_repeat += 1
log.debug("appendix is now '%s'", appendix)
except EOFError:
log.warn("Parameter '" + field + "' from the csv file was not found in the template file (already overwritten by config file parameters?)")
except KeyError:
log.warning("Run #" + runnr + " was not found in the specified CSV file - will skip this run! ")
continue
run_iterator += 1
log.debug("Found run %i for the %ith time.", run, run_iterator-1) # start counting at 0
for field in parameters_csv[line].keys():
# check if we actually find all parameters from the csv file in the steering file - warn if not
log.debug("Parsing steering file for csv field name '%s'", field)
try:
# check that the field name is not empty and do not yet replace the runnumber
if not field == "":
steeringString = ireplace("@" + field + "@", parameters_csv[line][field], steeringString)
except EOFError:
log.warn("Parameter '" + field + "' from the csv file was not found in the template file (already overwritten by config file parameters?)")
except KeyError:
log.warning("Run #" + runnr + " was not found in the specified CSV file - will skip this run! ")
continue
if not checkSteer(steeringString):
return 1
if args.htcondor_file:
args.htcondor_file = os.path.abspath(args.htcondor_file)
if not os.path.isfile(args.htcondor_file):
log.critical("HTCondor submission parameters file '"+args.htcondor_file+"' not found!")
if not checkSteer(steeringString):
return 1
log.debug ("Writing steering file for run %i_%i", run, run_iterator-1) # start counting at 0
# Get "jobtask" as basename of the configuration file:
jobtask = os.path.splitext(os.path.basename(args.conf_file))[0]
# Write the steering file:
basefilename = jobtask+"_"+runnr+"_"+str(run_iterator-1) # start counting at 0
log.info("basefilename = " + basefilename)
steeringFile = open(basefilename+".conf", "w")
try:
steeringFile.write(steeringString)
finally:
steeringFile.close()
# bail out if running a dry run
if args.dry_run:
log.info("Dry run: skipping Corryvreckan execution. Steering file written to "+basefilename+'.conf')
elif args.htcondor_file:
rcode = submitCondor(basefilename, args.htcondor_file, runnr+"_"+str(run_iterator-1)) # start HTCondor submission
if rcode == 0:
log.info("HTCondor job submitted")
else:
log.error("HTCondor submission returned with error code "+str(rcode))
else:
rcode = runCorryvreckan(basefilename, runnr+"_"+str(run_iterator-1), args.silent) # start Corryvreckan execution
if rcode == 0:
log.info("Corryvreckan execution done")
if args.htcondor_file:
args.htcondor_file = os.path.abspath(args.htcondor_file)
if not os.path.isfile(args.htcondor_file):
log.critical("HTCondor submission parameters file '"+args.htcondor_file+"' not found!")
return 1
# update this line too
log.debug ("Writing steering file for run %i", run)
# Get "jobtask" as basename of the configuration file:
jobtask = os.path.splitext(os.path.basename(args.conf_file))[0]
# Write the steering file:
basefilename = jobtask+"_run"+runnr+appendix
log.info("basefilename = " + basefilename)
steeringFile = open(basefilename+".conf", "w")
try:
steeringFile.write(steeringString)
finally:
steeringFile.close()
# bail out if running a dry run
if args.dry_run:
log.info("Dry run: skipping Corryvreckan execution. Steering file written to "+basefilename+'.conf')
elif args.htcondor_file:
rcode = submitCondor(basefilename, args.htcondor_file, basefilename) # start HTCondor submission
if rcode == 0:
log.info("HTCondor job submitted")
else:
log.error("HTCondor submission returned with error code "+str(rcode))
else:
log.error("Corryvreckan returned with error code "+str(rcode))
zipLogs(parameters["logpath"], basefilename)
rcode = runCorryvreckan(basefilename, basefilename, args.silent) # start Corryvreckan execution
if rcode == 0:
log.info("Corryvreckan execution done")
else:
log.error("Corryvreckan returned with error code "+str(rcode))
zipLogs(parameters["logpath"], basefilename)
# Return to old directory:
if args.subdir:
os.chdir(savedPath)
# Return to old directory:
if args.subdir:
os.chdir(savedPath)
if (i_repeat == n_repeat): # break the while loop
log.debug("Finished scanning run %d'.", run)
break
# end while true
# return to the previous signal handler
signal.signal(signal.SIGINT, prevINTHandler)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment