Working on #1 -- tagger and tag sequence prototype
Work in progress.
Initial implementation of Tagger
and TagSequence
workflow with toy systematics.
Basic design:
-
Tagger
class: abstract base class for taggers. Contains functionality that will be common to any tagger.- Set options (necessary branches, cut values, etc)
- Determine which events pass selection
- Print diagnostic info about cut efficiencies
-
DiphotonTagger
class shows an example of a derived class from theTagger
base class. Physics content is not validated yet, this is just for purposes of illustration. -
TagSequence
class:- takes a list of
Tagger
s, with the order in which they are given dictating priority (first tag gets all events passing its selection, subsequent tags get events passing their selection which do not enter any previous tags) - takes sets of events to perform the tag sequence on (systematics with independent collections). User can either pass a single awkward array (assumed to be nominal events), or a dictionary of the systematic variations with independent collections.
- takes a list of weight variations to save for the nominal events (only save nominal weight for the systematic variations with independent collections)
-
.run()
method performs the selection for each tag in the tag sequence, ensures orthogonality between tags, writes selected events to disk (onlyparquet
format implemented currently), and writes a summaryjson
with diagnostic info
- takes a list of
I also include a toy example illustrating the workflow:
- Load a ttH 2018 MC nanoAOD file (with extra branches added for diphoton preselection)
- Create dummy systematics for both an independent collection and weight variation:
- independent collection: vary photon pT up/down by 1 GeV
- weight variation: vary nominal weight up/down by 5%
- Create two
DiphotonTagger
taggers:- one with nominal diphoton preselection
- one with a looser version of diphoton preselection (to illustrate tag priority functionality)
- Create
TagSequence
object:- First priority given to nominal diphoton preselection tagger, second priority given to loosened diphoton preselection tagger
- Perform selection for the tag sequence for the nominal events and the events with photon pT varied up/down
- Write awkward arrays of events to
parquet
files
- Load
parquet
files, convert topandas
dataframe, explore contents
Notes:
- compiling functions with
numba
-- performance of diphoton preselection with ~43k events- plain
python
loop: 73s - compiled with
numba
(including compilation time): 0.80s - compiled with
numba
(pre-compiled): 0.032s
- plain
- factor of ~2000 speed-up from
numba
!
Things which still need to be implemented/improved:
- Logging: I am just using
print
functions, this can be done in a more unified and coherent way. - Implementation of systematics with independent collections: I thought it would be ideal if the
Tagger
selection functions are agnostic to the systematics variations. In other words, the same selection function should be used for the nominal events and each of the systematic variations, but we pass the different set of events for each one. To this end, I created copies of the originalawkward
array for each of the systematic variations. In thephoton_pt_up
variation,events.Photon.pt
points to the up variation of photon pt. In the future, this can probably be done in a more efficient way: we do not need multiple copies of the original array, but should instead create different "pointers" to the original array. So, if we haveevents["Photon_pt"], events["Photon_pt_up"], events["Photon_pt_down"]
, we would create three separate pointers for each: in the nominal events,events.Photon.pt
accessesevents["Photon_pt"]
, the up/down variationsevents.Photon.pt
would accessevents["Photon_pt_up"]
/events["Photon_pt_down"]
. - Addition of taggers: if I want to create
ttHLeptonicTagger
andttHHadronicTagger
object, I should be able to easily add theDiphotonTagger
selection to each. - Efficient selection between taggers: if multiple taggers each share the diphoton preselection, this should be calculated only once.
- Sync
DiphotonTagger
selection withflashgg
Test the toy example with
python toy_tag_sequence.py
Let me know if you have comments/suggestions/questions.
Edited by Samuel May