plotIt-related code reorganisation (!109) · Merge requests · cms-analysis / General / bamboo

Pieter David requested to merge piedavid/bamboo:plotit_reorg into master May 05, 2020

This splits the generation of the plotIt config from the information available in bamboo into different parts. Typical use cases should only need writePlotIt and runPlotIt; to instead get them parsed with the python version of plotIt, the loadPlotIt method can be used.

The goal is to make this part more customisable (i.e. change the files, samples, or some options) and the individual parts reusable. Together with the code that's shaping up here it should be easy to get any (stacks/groups of) histograms as plotIt would use them, perform calculations, and make different plots (see below).

This should be mostly transparent for existing code, except in two cases:

if runPlotIt is called directly (it should be straightforward to adjust: replace by writePlotIt, which takes most of the arguments, and the new runPlotIt, which reads the plots.yml)
if nonstandard keys are used for some samples and one wants to use the python-plotIt, these should not be copied to the config dictionaries (it's more picky about this than the C++ plotIt, which would only read what it knows and ignore the rest - both options can be considered a feature ;-) ). Those that are defined for use in bamboo (["db", "split", "files", "run_range", "certified_lumi_file"]) are in AnalysisModule.CustomSampleAttributes, and they are picked up by inheritance if added like this: It's possible to specify a list of keys that should not be propagated to the plotIt config (on to of the default ones with DAS paths, splitting etc.) with this (but unknown keys are ignored by the python plotit as they are by the C++ one):

class MyMod(AnalysisModule):
    CustomSampleAttributes = AnalysisModule.CustomSampleAttributes + ["PU", "subprocess"]

On the positive side, you can have 2D plots (without systematics, but if you have a way to visualise them the histograms are there), data and MC total side by side, with

from bamboo.plots import Plot, DerivedPlot
plotList_2D = [ ap for ap in self.plotList if ( isinstance(ap, Plot) or isinstance(ap, DerivedPlot) ) and len(ap.binnings) == 2 ]
from bamboo.analysisutils import loadPlotIt
p_config, samples, plots_2D, systematics, legend = loadPlotIt(config, plotList_2D, eras=self.args.eras, workdir=workdir, resultsdir=resultsdir, readCounters=self.readCounters, vetoFileAttributes=self.__class__.CustomSampleAttributes, plotDefaults=self.plotDefaults)
from plotit.plotit import Stack
from bamboo.root import gbl
for plot in plots_2D:
    obsStack = Stack(smp.getHist(plot) for smp in samples if smp.cfg.type == "DATA")
    expStack = Stack(smp.getHist(plot) for smp in samples if smp.cfg.type == "MC")
    cv = gbl.TCanvas(f"c{plot.name}")
    cv.Divide(2)
    cv.cd(1)
    expStack.total.Draw("COLZ")
    cv.cd(2)
    obsStack.total.Draw("COLZ")
    cv.Update()
    cv.SaveAs(f"{plot.name}.png")

in the module postprocess method (samples is a list of plotIt groups or ungrouped files, already sorted).

Alternatively, there's also the plotit.plotit.loadFromYAML(yamlFileName, histodir=".", eras=None) method, to load the same from a plots.yml file

from plotit.plotit import loadFromYAML
p_config, samples, plots, systematics, legend = loadFromYAML(os.path.join(workdir, "plots.yml"), histodir=workdir)

This is just a first proposal - any feedback and suggestions are welcome (there are a quite a few things to be done or improved in the python-plotIt code, and for this one a bit more testing and documentation).

Edited May 28, 2020 by Pieter David

Admin message

plotIt-related code reorganisation

Merge request reports