Skip to content

Add YAML parsing & validation for checks

Dylan Jaide White requested to merge djwhite/checks into master

Adds YAML syntax for checks, based on suggestions made in https://gitlab.cern.ch/lhcb-dpa/project/-/issues/109. This includes:

  • New options for jobs: checks and extra_checks. During parsing, extra_checks is appended to checks and deleted, so behind-the-scenes checks will contain the full list of checks to perform for each job.
  • A checks section (job name keyword), where the user can define their checks from a predefined list of check types. Each type of check has its own defined schema. N.B. the functionality of performing the checks is not implemented here, only the user input of the configuration that will get passed to them. The currently implemented types are:
    • range for 1D histograms
    • range_nd for 2D, 3D, or 4D histograms
    • num_entries to check that there are at least some number of events
    • num_entries_per_invpb to check that there is at least some number of events per unit luminosity (useful for checking selection efficiency)

This also contains some syntax changes that will affect other parts of the analysis productions system - in particular:

  • parsing.parse_yaml() now returns 2 values instead of 1: a jobs_data dictionary, which is functionally equivalent to the single argument returned previously; and a checks_data dictionary, which contains all the user definitions of checks to perform.
  • parsing.validate_yaml() now takes 4 arguments instead of 3: the new checks_data (from parse_yaml()) is now required in position 2. The first, and final two, arguments are unchanged.

Two tests are also implemented:

  • One job with no defaults, which tests all different types of checks
  • Two jobs with defaults and extra_checks sections
Edited by Dylan Jaide White

Merge request reports