Draft: Introduce config yml: Flexibilisation first efforts (!260) · Merge requests · cms-analysis / General / HiggsDNA

Jan Lukas Spah requested to merge introduce_config_yml into master Aug 14, 2024

Overview of changes / Philosophy:

This MR starts the effort of making HiggsDNA more friendly for both developers and users. The goal is to allow users to run the framework in a very flexible way without changing the code. This is achieved with a config yml (not JSON), which is specified in the runner JSON. For example:

event_selection:
    photon:
        min_pt: 30
    diphoton:
        fiducial_cuts: classical

would allow you to run HiggsDNA with a minimum pt cut for photons of 30 GeV, which is currently not possible without touching code itself. Similarly, you can change the mode of the fiducial_cuts at detector level without specifying a command line argument, this makes everything easier to maintain. Settings that are not specified are taken from a default yml. This takes the place of the current numbers in the abstract base processor.

All workflows and processors have been adjusted to fit this. This MR also starts the process of introducing more unit tests for processors since it is currently very easy to break them even on master, sadly. For example, HHbbgg or ParticleLevel do not run, currently. The unit tests serve to counteract this.

Future todos include moving more of the command line arguments and the attributes of the processors to the yml. This is just the start.

What do users need to know?

Create a config.yml (or any other fitting name, but it has to be yml) and specify it in the runner.json. There, you can specify your desired cuts and settings for your processor. Currently, it only supports photon settings, but this will be expanded a lot in the future (NB: Might even be further expanded before merging it / draft). Anything that you do not specify is taken from the default config file in metaconditions.
Do not specify --fiducialCuts anymore. This is handled by the setting in the config.yml (or classical by default).

This MR is not backwards compatible. The changes need and will be announced transparently for the users through all possible channels!

List of relevant file changes in detail:

Added default yml config at higgs_dna/metaconditions/default_config.yml , which stores the default values of cuts and settings. If user does not provide a value for a setting. Should not be changed frequently (Hgg-wide customary settings)
In selections: np.int -> int (np deprecation warning)
higgs_dna/selections/diphoton_selections.py : Factored out fiducial selection at detector level, file can accommodate diphoton selections in general in the future
higgs_dna/utils/yml_config_parsing.py : Contains the utility to handle the default and the user specified yml
All workflows: Removed fiducialCuts argument and added yml_config argument
higgs_dna/workflows/HHbbgg.py: Fixed by adding electron cutBased (was broken on master). Can be changed later if desired (most important is that the workflow runs)
higgs_dna/workflows/base.py: Init of abstract base class contains parsing of yml (also the case for Hpc since Hpc reimplements super init)
higgs_dna/workflows/particleLevel.py: Aligned naming conventions with changes in base and fixed the processor since the gen attributes are now given back as pt eta phi instead of pt eta
scripts/run_analysis.py: Included yml treatment
tests/test_processors.py: Made more modular, include six of our nine main processors in the unit tests (was: three). Hpc, lowpass, Zmmy should follow (not included yet)

Edited Aug 14, 2024 by Jan Lukas Spah

Draft: Introduce config yml: Flexibilisation first efforts

This MR is not backwards compatible. The changes need and will be announced transparently for the users through all possible channels!

Merge request reports