Working on #1 -- tagger and tag sequence prototype
Work in progress.
Initial implementation of Tagger and TagSequence workflow with toy systematics.
Basic design:
-
Taggerclass: abstract base class for taggers. Contains functionality that will be common to any tagger.- Set options (necessary branches, cut values, etc)
- Determine which events pass selection
- Print diagnostic info about cut efficiencies
-
DiphotonTaggerclass shows an example of a derived class from theTaggerbase class. Physics content is not validated yet, this is just for purposes of illustration. -
TagSequenceclass:- takes a list of
Taggers, with the order in which they are given dictating priority (first tag gets all events passing its selection, subsequent tags get events passing their selection which do not enter any previous tags) - takes sets of events to perform the tag sequence on (systematics with independent collections). User can either pass a single awkward array (assumed to be nominal events), or a dictionary of the systematic variations with independent collections.
- takes a list of weight variations to save for the nominal events (only save nominal weight for the systematic variations with independent collections)
-
.run()method performs the selection for each tag in the tag sequence, ensures orthogonality between tags, writes selected events to disk (onlyparquetformat implemented currently), and writes a summaryjsonwith diagnostic info
- takes a list of
I also include a toy example illustrating the workflow:
- Load a ttH 2018 MC nanoAOD file (with extra branches added for diphoton preselection)
- Create dummy systematics for both an independent collection and weight variation:
- independent collection: vary photon pT up/down by 1 GeV
- weight variation: vary nominal weight up/down by 5%
- Create two
DiphotonTaggertaggers:- one with nominal diphoton preselection
- one with a looser version of diphoton preselection (to illustrate tag priority functionality)
- Create
TagSequenceobject:- First priority given to nominal diphoton preselection tagger, second priority given to loosened diphoton preselection tagger
- Perform selection for the tag sequence for the nominal events and the events with photon pT varied up/down
- Write awkward arrays of events to
parquetfiles
- Load
parquetfiles, convert topandasdataframe, explore contents
Notes:
- compiling functions with
numba-- performance of diphoton preselection with ~43k events- plain
pythonloop: 73s - compiled with
numba(including compilation time): 0.80s - compiled with
numba(pre-compiled): 0.032s
- plain
- factor of ~2000 speed-up from
numba!
Things which still need to be implemented/improved:
- Logging: I am just using
printfunctions, this can be done in a more unified and coherent way. - Implementation of systematics with independent collections: I thought it would be ideal if the
Taggerselection functions are agnostic to the systematics variations. In other words, the same selection function should be used for the nominal events and each of the systematic variations, but we pass the different set of events for each one. To this end, I created copies of the originalawkwardarray for each of the systematic variations. In thephoton_pt_upvariation,events.Photon.ptpoints to the up variation of photon pt. In the future, this can probably be done in a more efficient way: we do not need multiple copies of the original array, but should instead create different "pointers" to the original array. So, if we haveevents["Photon_pt"], events["Photon_pt_up"], events["Photon_pt_down"], we would create three separate pointers for each: in the nominal events,events.Photon.ptaccessesevents["Photon_pt"], the up/down variationsevents.Photon.ptwould accessevents["Photon_pt_up"]/events["Photon_pt_down"]. - Addition of taggers: if I want to create
ttHLeptonicTaggerandttHHadronicTaggerobject, I should be able to easily add theDiphotonTaggerselection to each. - Efficient selection between taggers: if multiple taggers each share the diphoton preselection, this should be calculated only once.
- Sync
DiphotonTaggerselection withflashgg
Test the toy example with
python toy_tag_sequence.py
Let me know if you have comments/suggestions/questions.
Edited by Samuel May