Draft: add helper method to write combine datacards
Hey,
This MR adds support for creating combine shapes and datacards. There are still things todo I am listing them here, but the code is very much complete.
-
Add documentation -
Need testing -
Update the systematics in plots.yml
after plotIt post-processing to assist users in creating thecombine_inputs.yml
file needed by the helper method. The current issue is that while systematics are initially added in the analysis YAML configuration, they are not updated afterward, and the specified systematics may not necessarily exist in the histograms (help-needed). -
Resolve issue with compiling CombineHarvester using LCG105
(help-needed). -
Add options to normalize plots (WIP)
So you need to specify the histograms you want combine to use during the fit. An example case, I fit DNN
histograms, so I do:
plotListForcombine =[]
for plots in self.plotList:
if plots.name.startswith('DNN_'):
plotListForcombine.append(plots)
Then, you also need YAML configuration file to define analysis categories and signal processes (see example below). In the post-processing you do:
def postProcess(self, taskList, config=None, workdir=None, resultsdir=None):
# Run plotIt as defined in HistogramsModule - this ensures self.plotList is present
super(MyModule, self).postProcess(taskList, config, workdir, resultsdir)
writeDataCard(config, plotListForcombine, combine_inputs,
self.args.eras[1],
pseudodata=False, # for blinded analysis
mass="125", # dummy mass for combine
workdir=workdir,
resultsdir=resultsdir,
verbose=1, # for debugging
onlyStats=False, # Nuissance parameters (NPs) won't be added to the datacards
CMSNamingConvention=False, # print NPs in datacards with cms convention name
readCounters=self.readCounters,
vetoFileAttributes=self.__class__.CustomSampleAttributes,
plotDefaults=self.plotDefaults)
An example configuration file:
run: 13TeV
analysis_categories:
control_regions:
- cr1 # cr1 should exist at least in one of the histogram names added above and the same is true for the rest of categories
- cr2
-
signal_regions:
- sr1
- sr2
-
signal_categories: #your signal processes and how they are parametrized ( masses, benchmark, or any other param)
bbH:
- MH_516p94_MA_109p30 # name should also exist in the histograms requested above
-
ggH:
- MH_516p94_MA_109p30
-
In this setup, histogram shapes will be saved and datacard will be created for:
ggH_MH_516p94_MA_109p30_cr1.dat
ggH_MH_516p94_MA_109p30_cr2.dat
ggH_MH_516p94_MA_109p30_sr1.dat
ggH_MH_516p94_MA_109p30_sr2.dat
# also for bbH
You can also combine categories if you add to the .yml
for example:
combine_categories:
- subcategories: [cr1, sr1] # additional card will be created ggH_MH_516p94_MA_109p30_cr1_sr1.dat also for bbH
A full example configuration file for my case can be found here. Most configurations in the example aren't required except for the ones I mentioned above, as the helper method writeDataCard()
can access systematics in histograms.
The systematics blocks in combine_inputs.yml
are useful if you want to apply them on certain categories, processes, or eras. You can use on_category
, on_process
, or on_era
. If these keys aren't included, it will assume you want the nuisance parameters applied to all categories, processes, and eras.
It's also useful for decorrelating uncertainties by using:
decorrelate_cat: true
decorrelate_era: true
decorrelate_process: true
There are more features like scaling certain process by a factor or switch to cms convention name, these I will add them in more details to the documentation.