Helpers for including data-driven backgrounds (and some internal changes)
Current status: the histograms for the data-driven backgrounds are there, in separate files, so the main thing missing is generating the modified plotIt configurations - and anything else I didn't think of ;-) .
The configuration should be fairly generic: a data-driven contribution uses some samples or groups (usually just data), and replaces some others (usually a few MC samples). A scenario should be specified, saying which of these to activate (e.g. all, none, a specific list - the parsing code supports several scenarios in one run).
This is all in the DataDrivenBackgroundAnalysisModule
base class; DataDrivenBackgroundHistogramsModule
adds saving of the background histograms in a separate file, e.g.
datadriven:
chargeMisID:
use: [ data ]
replaces: [ DY ]
nonprompt:
use: [ data ]
replaces: [ TTbar ]
In the module code, SelectionWithDataDriven.create
replaces Selection.refine
, but adds arguments for the control region cut and misID/fake/transfer factor weights, e.g.
hasSameSignElEl = SelectionWithDataDriven.create(hasElElZ, "hasSSDiElZ", "chargeMisID",
cut=(diel[0].Charge == diel[1].Charge),
ddCut=(diel[0].Charge != diel[1].Charge),
ddWeight=p_chargeMisID(diel[0])+p_chargeMisID(diel[1]),
enable=any(contrib.usesSample(sample, sampleCfg) for contrib in self.datadrivenContributions.values())
)
This needed one under-the-hood change: it should be possible to declare several products (each corresponds to a list of objects produced by the RDataFrame, e.g. a nominal and systematic variation histograms) with the same name (which is used for the histogram names) - so I added a key to products which defaults to their name, but can be set to something else (which is done using the same convention by DataDrivenBackgroundHistogramsModule
and SelectionWithDataDriven
).
There are also a few minor additions to make it easier to extend HistogramsModule
, and use the parsed YAML analysis configuration in different places.
cc @fbury
Known regressions:
-
--distributed=finalize
cannot deal with more than one output file per sample yet