Skip to content

Modularising easyjet-ntupler: Split into EasyjetHub and HH4bAnalysis

Teng Jian Khoo requested to merge modularise-ntupler into master

In consideration of #70 (closed)

Just to get initial impressions, I split up the package into core FW and a dedicated analysis package. This could be the model for future extension to other analyses (where EasyjetHub or perhaps a containing directory including BJetCalibrationTool and H5Writer as well might become a submodule...).

I considered a few options on how to extend the core framework. A couple of ideas:

  • Make a lot of the easyjet-ntupler contents into modules that define a very minimal API
    • Requires the user to write their own executable, hence
    • Needs to be O(1) lines of code to get a basic job running
    • Is maximally extensible thereafter
  • Predefine entry points (preselection i.e. before CP algs, postselection i.e. after CP algs, output branch extension) where user-provided CA can be merged in
    • Easier to plug into the existing executable
    • Needs a solution for locating the user code (fix some name + signature then provide module location to importlib?)
    • Does not scale to complex analyses with many SRs/CRs
  • A hybrid solution supporting both? For now I implemented the first one, moving the HH4b-specific algs into the HH4bAnalysis package and creating a new executable + tests, although there may need to be a bit of review here.

Also open to naming suggestions as alternative to EasyjetHub!

TO FOLLOW UP:

  • Check usage of HH4bAnalysis/truth_particle_info_config -- is this still needed for the SH4b and HH4b jobs that use the default PHYS/PHYSLITE RunConfig, or only for the other RunConfigs that I migrated to HH4bAnalysis?
  • Look over test definitions and see if the package split is correct
  • Avoid duplicating code between easyjet-test and hh4b-test
  • Verify where we would like to keep dataset lists -- which, if any should stay in the hub package?

The changes are unfortunately rather polluted by all of the relocated files, so some indications where feedback is especially welcome:

  • Compare easyjet-ntupler, which is supposed to demonstrate a running job in the fewest possible LOC (6-7), though it is in practice 60 to accommodate comments and support for the CI, with the analysis-extended version hh4b-ntupler (adds analysis CLI arguments, algorithms and extends the branch output via flags)
  • Consider the structure of steering/ incl file names, and distribution of the functions between them. I believe it is anyhow nicer to use steering as a module and avoid needing to import the sub-files, so may not need to be super fussy.
  • Review if we like the resulting distribution of functions between utils/ and steering/ esp the argument parsing and flag filling.
  • Consider if it makes sense to have a single output config with the choice between h5 and ROOT output made by flags, or if the user should just use the underlying config
  • Decide if minituple_config should handle the core branches as is currently done, and have all extensions via the extra output branches flag, or if instead the user should build a full branch list (we provide a helper function) and then pass the full branch list to minituple_config.

Incidental changes:

  • Created flags for TriggerChains, do_PRW, PRWFiles and LumiCalcFiles to ease passing between modules
Edited by Teng Jian Khoo

Merge request reports