Skip to content

Working on #1 -- tagger and tag sequence prototype

Samuel May requested to merge smay/HiggsDNA:tag_sequence_prototype into master

Work in progress.

Initial implementation of Tagger and TagSequence workflow with toy systematics.

Basic design:

  • Tagger class: abstract base class for taggers. Contains functionality that will be common to any tagger.
    • Set options (necessary branches, cut values, etc)
    • Determine which events pass selection
    • Print diagnostic info about cut efficiencies
  • DiphotonTagger class shows an example of a derived class from the Tagger base class. Physics content is not validated yet, this is just for purposes of illustration.
  • TagSequence class:
    • takes a list of Taggers, with the order in which they are given dictating priority (first tag gets all events passing its selection, subsequent tags get events passing their selection which do not enter any previous tags)
    • takes sets of events to perform the tag sequence on (systematics with independent collections). User can either pass a single awkward array (assumed to be nominal events), or a dictionary of the systematic variations with independent collections.
    • takes a list of weight variations to save for the nominal events (only save nominal weight for the systematic variations with independent collections)
    • .run() method performs the selection for each tag in the tag sequence, ensures orthogonality between tags, writes selected events to disk (only parquet format implemented currently), and writes a summary json with diagnostic info

I also include a toy example illustrating the workflow:

  • Load a ttH 2018 MC nanoAOD file (with extra branches added for diphoton preselection)
  • Create dummy systematics for both an independent collection and weight variation:
    • independent collection: vary photon pT up/down by 1 GeV
    • weight variation: vary nominal weight up/down by 5%
  • Create two DiphotonTagger taggers:
    • one with nominal diphoton preselection
    • one with a looser version of diphoton preselection (to illustrate tag priority functionality)
  • Create TagSequence object:
    • First priority given to nominal diphoton preselection tagger, second priority given to loosened diphoton preselection tagger
    • Perform selection for the tag sequence for the nominal events and the events with photon pT varied up/down
    • Write awkward arrays of events to parquet files
  • Load parquet files, convert to pandas dataframe, explore contents

Notes:

  • compiling functions with numba -- performance of diphoton preselection with ~43k events
    • plain python loop: 73s
    • compiled with numba (including compilation time): 0.80s
    • compiled with numba (pre-compiled): 0.032s
  • factor of ~2000 speed-up from numba!

Things which still need to be implemented/improved:

  • Logging: I am just using print functions, this can be done in a more unified and coherent way.
  • Implementation of systematics with independent collections: I thought it would be ideal if the Tagger selection functions are agnostic to the systematics variations. In other words, the same selection function should be used for the nominal events and each of the systematic variations, but we pass the different set of events for each one. To this end, I created copies of the original awkward array for each of the systematic variations. In the photon_pt_up variation, events.Photon.pt points to the up variation of photon pt. In the future, this can probably be done in a more efficient way: we do not need multiple copies of the original array, but should instead create different "pointers" to the original array. So, if we have events["Photon_pt"], events["Photon_pt_up"], events["Photon_pt_down"], we would create three separate pointers for each: in the nominal events, events.Photon.pt accesses events["Photon_pt"], the up/down variations events.Photon.pt would access events["Photon_pt_up"]/events["Photon_pt_down"].
  • Addition of taggers: if I want to create ttHLeptonicTagger and ttHHadronicTagger object, I should be able to easily add the DiphotonTagger selection to each.
  • Efficient selection between taggers: if multiple taggers each share the diphoton preselection, this should be calculated only once.
  • Sync DiphotonTagger selection with flashgg

Test the toy example with python toy_tag_sequence.py

Let me know if you have comments/suggestions/questions.

Edited by Samuel May

Merge request reports