Skip to content

Helpers for including data-driven backgrounds (and some internal changes)

Pieter David requested to merge piedavid/bamboo:datadrivenbackgrounds into master

Current status: the histograms for the data-driven backgrounds are there, in separate files, so the main thing missing is generating the modified plotIt configurations - and anything else I didn't think of ;-) .

The configuration should be fairly generic: a data-driven contribution uses some samples or groups (usually just data), and replaces some others (usually a few MC samples). A scenario should be specified, saying which of these to activate (e.g. all, none, a specific list - the parsing code supports several scenarios in one run). This is all in the DataDrivenBackgroundAnalysisModule base class; DataDrivenBackgroundHistogramsModule adds saving of the background histograms in a separate file, e.g.

datadriven:
  chargeMisID:
    use: [ data ]
    replaces: [ DY ]
  nonprompt:
    use: [ data ]
    replaces: [ TTbar ]

In the module code, SelectionWithDataDriven.create replaces Selection.refine, but adds arguments for the control region cut and misID/fake/transfer factor weights, e.g.

hasSameSignElEl = SelectionWithDataDriven.create(hasElElZ, "hasSSDiElZ", "chargeMisID",
    cut=(diel[0].Charge == diel[1].Charge),
    ddCut=(diel[0].Charge != diel[1].Charge),
    ddWeight=p_chargeMisID(diel[0])+p_chargeMisID(diel[1]),
    enable=any(contrib.usesSample(sample, sampleCfg) for contrib in self.datadrivenContributions.values())
    )

This needed one under-the-hood change: it should be possible to declare several products (each corresponds to a list of objects produced by the RDataFrame, e.g. a nominal and systematic variation histograms) with the same name (which is used for the histogram names) - so I added a key to products which defaults to their name, but can be set to something else (which is done using the same convention by DataDrivenBackgroundHistogramsModule and SelectionWithDataDriven).

There are also a few minor additions to make it easier to extend HistogramsModule, and use the parsed YAML analysis configuration in different places.

cc @fbury

Known regressions:

  • --distributed=finalize cannot deal with more than one output file per sample yet
Edited by Pieter David

Merge request reports