Draft: add helper method to write combine datacards (!247) · Merge requests · cms-analysis / General / bamboo

Khawla Jaffel requested to merge kjaffel/bamboo:_4thCAThackathon_carddev into master Jul 02, 2024

Hey,

This MR adds support for creating combine shapes and datacards. There are still things todo I am listing them here, but the code is very much complete.

Add documentation
Need testing
Update the systematics in plots.yml after plotIt post-processing to assist users in creating the combine_inputs.yml file needed by the helper method. The current issue is that while systematics are initially added in the analysis YAML configuration, they are not updated afterward, and the specified systematics may not necessarily exist in the histograms (help-needed).
Resolve issue with compiling CombineHarvester using LCG105 (help-needed).
Add options to normalize plots (WIP)

So you need to specify the histograms you want combine to use during the fit. An example case, I fit DNN histograms, so I do:

    plotListForcombine =[]
    for plots in self.plotList:
        if plots.name.startswith('DNN_'):
            plotListForcombine.append(plots)

Then, you also need YAML configuration file to define analysis categories and signal processes (see example below). In the post-processing you do:

def postProcess(self, taskList, config=None, workdir=None, resultsdir=None):
    # Run plotIt as defined in HistogramsModule - this ensures self.plotList is present
    super(MyModule, self).postProcess(taskList, config, workdir, resultsdir)
    
    writeDataCard(config, plotListForcombine, combine_inputs,
                  self.args.eras[1], 
                  pseudodata=False, # for blinded analysis
                  mass="125",    # dummy mass for combine 
                  workdir=workdir,
                  resultsdir=resultsdir, 
                  verbose=1, # for debugging 
                  onlyStats=False,   # Nuissance parameters (NPs) won't be added to the datacards
                  CMSNamingConvention=False, # print NPs in datacards with cms convention name
                  readCounters=self.readCounters,
                  vetoFileAttributes=self.__class__.CustomSampleAttributes, 
                  plotDefaults=self.plotDefaults)

An example configuration file:

    run: 13TeV
    analysis_categories:
      control_regions:
        - cr1 # cr1 should exist at least in one of the histogram names added above and the same is true for the rest of categories
        - cr2 
        - 
      signal_regions:
        - sr1
        - sr2
        - 
    signal_categories: #your signal processes and how they are parametrized ( masses, benchmark, or any other param)
      bbH: 
        - MH_516p94_MA_109p30 #  name should also exist in the histograms requested above 
        - 
      ggH:
        - MH_516p94_MA_109p30
        -

In this setup, histogram shapes will be saved and datacard will be created for:

ggH_MH_516p94_MA_109p30_cr1.dat
ggH_MH_516p94_MA_109p30_cr2.dat
ggH_MH_516p94_MA_109p30_sr1.dat
ggH_MH_516p94_MA_109p30_sr2.dat

# also for bbH

You can also combine categories if you add to the .yml for example:

combine_categories:
   - subcategories: [cr1, sr1]  # additional card will be created ggH_MH_516p94_MA_109p30_cr1_sr1.dat also for bbH

A full example configuration file for my case can be found here. Most configurations in the example aren't required except for the ones I mentioned above, as the helper method writeDataCard() can access systematics in histograms.

The systematics blocks in combine_inputs.yml are useful if you want to apply them on certain categories, processes, or eras. You can use on_category, on_process, or on_era. If these keys aren't included, it will assume you want the nuisance parameters applied to all categories, processes, and eras.

It's also useful for decorrelating uncertainties by using:

decorrelate_cat: true 
decorrelate_era: true
decorrelate_process: true

There are more features like scaling certain process by a factor or switch to cms convention name, these I will add them in more details to the documentation.

Draft: add helper method to write combine datacards

Merge request reports