WIP: Common parser (!305) · Merge requests · HEP-Benchmarks / hep-workloads

Martina Javurkova requested to merge common_parser into master Mar 05, 2020

This new branch contains common parser scripts. There are many changes in several places so I will describe them here below:

common directory:
- bmk-driver.sh
  - executes bmkParser.py script if commonExpDir exists (for ATLAS, it is atlas/atlasCommon)
  - export EXPERIMENT variable which is used by the common parser script
- bmkParser.py
  - imports a parser for the experiment WL which is running and calls its methods
  - methods that need to be implemented by each experiment in {EXPERIMENT}Common/{EXPERIMENT}Parser.py
    - check(indir,app): checks if ALL input files are ready to be read i.e. can be opened and the job(s) finished successfully and returns True/False
    - collect_values(indir,app): collect all the information from each input file and fill them to a dictionary with self-explained keys and the following structure: {“digi-reco”: {“wallScore”:[...], “status”:[...]}, “HITtoRDO”:{...}}
      - mandatory keys: score (list of 0/1 containing the status of each copy) and wallScore (list of walltime scores of each copy)
  - methods that are optional and each experiment can put here everything considered useful:
    - get_wl_custom(collectVars,calculateVars)
  - methods that have default functionalities:
    - calculate_stats(...): takes the dictionary from collect_values(indir,app) and calculates score, average, median, minimum and maximum
    - get_wl_inputs(): gets the information about all input parameters from variables exported in bmk-parser.sh (os.environ['NCOPIES']), int(os.environ['NTHREADS']) and os.environ['NEVENTS_THREAD'])), generates a "wl-inputs" key in the output dictionary
    - get_wl_scores(calculateVars): generates a "wl-scores" key
    - get_wl_stats(calculateVars): generates a "wl-stats" key
    - get_wl_info(app): generates a "wl-info" key with the information about WL from version.json file
    - save_output(app,jsonOverallSummary,summaryJSON): saves the output dictionary to a JSON file
  - a "wl-status" key in the output JSON file can have several values:
    - number of failed copies i.e. 0 if everything is fine
    - 901 in case of a problem with input files i.e. missing, corrupted or job failed
    - 902 if no values have been extracted from input files and empty dictionary is returned
    - 903 if there is no "status" key in the returned dictionary from collect_values(...)
    - 904 if there is no "wallScore" key in the returned dictionary from collect_values(...)
  - abstractParser.py
    - contains all predefined methods
  - Dockerfile.header
    - added python-importlib which is used in bmkParser.py to dynamically import methods from each experiment according to the application running
atlas directory:
- atlasCommon directory
  - contains atlasParser.py script with methods specific for the ATLAS experiment
  - contains init.py to mark this directory as python package directory

The output JSON file is called ${APP}_summary_newParser.json and have the following structure: JSON files will have the following keys

wl-inputs: copies, threads_per_copy, events_per_thread
wl-scores: e.g. wl-scores": {"ESDtoAOD": {"score": 9.4955}, "digi-reco": {"score": 0.58979}, ...}
wl-stats: median, avg, min and max
wl-status: 0:success, >0: number of failed copies or error message
wl-info: version, description, cvmfs_checksum, etc
wl-custom: each experiment can add whichever information

This should close several JIRA tickets: BMK-79, BMK-211, BMK-78, BMK-107, BMK-169.

It still needs to be tested.

WIP: Common parser

Merge request reports