Skip to content

Histogram-specific plotting at visualize.py

Tae Hyoun Park requested to merge histogram-specific-plotting into master

I wanted to address a personal (but almost certainly by others as well) annoyance when trying to make many plots at visualize.py step, and one that I think ultimately improves the out-of-the-box plotting functionality.

The issue I faced was that often times I wanted to plot histograms in very different styles. Fortunately, the existing tags supported by the TQDefaultPlotter class can be quite fine-tunable through the plotter.<key>: <val> lines in the master visualize config. But even in the simplest scenario of plot-tweaking, e.g. pT spectrum in log-scale and eta in lin-scale, the config eventually ends up looking something like this:

# part of visualize.py master config

makePlots: */leadLep*  # producing leadLepPt & leadLepEta plots
makeLinPlots: true # default, produces hist-lin.pdf
makeLogPlots: true # produces hist-log.pdf

# pT-specific settings
plotter.style.logScaleX: true
plotter.style.logMin: 1.0 # log plots can't go down to 0.0
plotter.style.max.scale: 100.0 # push histogram yield down in log-scale.

# eta-specific settings
# commented out when producing pT plots and vice versa.
#plotter.style.logScaleX: false
#plotter.style.linMin: 0.0 # lin plots can go down to 0.0
#plotter.style.max.scale: 1.5 # push histogram yield down in log-scale.

Notice the config "blocks" that are meant to be for plotting a specific observable. The issues that come up with this workflow are:

  • Each block is fine-tuned for a specific observable, but one typically produces many other plots all at once during visualize.py: so any block of plotter.<key>: <val> configs enabled at any given run is not suitable for others.
  • As a corollary, the user must run visualize.py N times to get N plots in their respectively desired styles, all the while commenting & un-commenting from one config block to the next.
  • Once run, the previously styled plots are "lost" (file overwritten with wrong settings by the current run).
  • The master config becomes very crowded when one plotter.<key>: <val> line can only change one tag at a time. This really hinders the otherwise wonderful "all the plots you ever want!" machinery that CAF provides.

The solution I want to propose in this MR is to apply these tags on a per-histogram basis, specifically for each CutX/HistY listed by makePlots. A working example in this branch uses a separate config file brought in via histogramPlotFiles in the master config:

# example of proposed 'plot-histograms.cfg' file to specify plotter options on a per-cut/hist basis

# pT spectrum
.name="*/leadLepPt", style.logScale=true, style.logScaleX=true, style.logMin=1., style.max.scale=100.

# eta spectrum
.name="*/leadLepEta", style.logScale=false, style.logScaleX=false, style.linMin=0., style.max.scale=1.5

The main benefits of this approach are:

  • The syntax structure is much more concise than having blocks of plotter.<key>: <val> lines, as multiple tags can be applied at once to a plot in a single line.
  • Only the tags matching the Cut/Hist being processed are applied to the plot, i.e. run visualize.py once, and fine-tune each histogram plot separately.

Here is a README.md that I wrote to explain how this config file works in more detail, which should probably be added to CAFExample if this MR is indeed fulfilled. I would love to receive feedback on this, as I think this is a promising way to improve plotting in CAF!

Edited by Tae Hyoun Park

Merge request reports