Histogram-specific plotting at visualize.py (!623) · Merge requests · atlas-caf / CAFCore

Tae Hyoun Park requested to merge histogram-specific-plotting into master Aug 05, 2020

I wanted to address a personal (but almost certainly by others as well) annoyance when trying to make many plots at visualize.py step, and one that I think ultimately improves the out-of-the-box plotting functionality.

The issue I faced was that often times I wanted to plot histograms in very different styles. Fortunately, the existing tags supported by the TQDefaultPlotter class can be quite fine-tunable through the plotter.<key>: <val> lines in the master visualize config. But even in the simplest scenario of plot-tweaking, e.g. pT spectrum in log-scale and eta in lin-scale, the config eventually ends up looking something like this:

# part of visualize.py master config

makePlots: */leadLep*  # producing leadLepPt & leadLepEta plots
makeLinPlots: true # default, produces hist-lin.pdf
makeLogPlots: true # produces hist-log.pdf

# pT-specific settings
plotter.style.logScaleX: true
plotter.style.logMin: 1.0 # log plots can't go down to 0.0
plotter.style.max.scale: 100.0 # push histogram yield down in log-scale.

# eta-specific settings
# commented out when producing pT plots and vice versa.
#plotter.style.logScaleX: false
#plotter.style.linMin: 0.0 # lin plots can go down to 0.0
#plotter.style.max.scale: 1.5 # push histogram yield down in log-scale.

Notice the config "blocks" that are meant to be for plotting a specific observable. The issues that come up with this workflow are:

Each block is fine-tuned for a specific observable, but one typically produces many other plots all at once during visualize.py: so any block of plotter.<key>: <val> configs enabled at any given run is not suitable for others.
As a corollary, the user must run visualize.py N times to get N plots in their respectively desired styles, all the while commenting & un-commenting from one config block to the next.
Once run, the previously styled plots are "lost" (file overwritten with wrong settings by the current run).
The master config becomes very crowded when one plotter.<key>: <val> line can only change one tag at a time. This really hinders the otherwise wonderful "all the plots you ever want!" machinery that CAF provides.

The solution I want to propose in this MR is to apply these tags on a per-histogram basis, specifically for each CutX/HistY listed by makePlots. A working example in this branch uses a separate config file brought in via histogramPlotFiles in the master config:

# example of proposed 'plot-histograms.cfg' file to specify plotter options on a per-cut/hist basis

# pT spectrum
.name="*/leadLepPt", style.logScale=true, style.logScaleX=true, style.logMin=1., style.max.scale=100.

# eta spectrum
.name="*/leadLepEta", style.logScale=false, style.logScaleX=false, style.linMin=0., style.max.scale=1.5

The main benefits of this approach are:

The syntax structure is much more concise than having blocks of plotter.<key>: <val> lines, as multiple tags can be applied at once to a plot in a single line.
Only the tags matching the Cut/Hist being processed are applied to the plot, i.e. run visualize.py once, and fine-tune each histogram plot separately.

Here is a README.md that I wrote to explain how this config file works in more detail, which should probably be added to CAFExample if this MR is indeed fulfilled. I would love to receive feedback on this, as I think this is a promising way to improve plotting in CAF!

Edited Aug 07, 2020 by Tae Hyoun Park

Histogram-specific plotting at visualize.py

Merge request reports