TIMBER  beta
Tree Interface for Making Binned Events with RDataFrame
Public Member Functions | Public Attributes | List of all members
analyzer Class Reference

Main class for TIMBER. More...

Inheritance diagram for analyzer:
Inheritance graph
[legend]
Collaboration diagram for analyzer:
Collaboration graph
[legend]

Public Member Functions

def __init__ (self, fileName, eventsTreeName="Events", runTreeName="Runs", createAllCollections=False)
 Constructor. More...
 
def Close (self)
 Safely deletes analyzer instance. More...
 
def __str__ (self)
 Call with print(<analyzer>) to print a nicely formatted description of the analyzer object for debugging. More...
 
def DataFrame (self)
 DataFrame of the ActiveNode. More...
 
def Snapshot (self, columns, outfilename, treename, lazy=False, openOption='RECREATE')
 
def SaveRunChain (self, filename, merge=True)
 Save the Run tree (chain of all input files) to filename. More...
 
def Range (self, argv)
 
def GetCollectionNames (self)
 Return a list of all collections that currently exist (including those that have been added). More...
 
def SetActiveNode (self, node)
 Sets the active node. More...
 
def GetActiveNode (self)
 Get the active node. More...
 
def GetBaseNode (self)
 Get the base node. More...
 
def TrackNode (self, node)
 Add a node to track. More...
 
def GetTrackedNodeNames (self)
 Gets the names of the nodes currently being tracked. More...
 
def GetCorrectionNames (self)
 Get names of all corrections being tracked. More...
 
def FilterColumnNames (self, columns, node=None)
 Takes a list of possible columns and returns only those that exist in the RDataFrame of the supplied node. More...
 
def GetTriggerString (self, trigList)
 Checks input list for missing triggers and drops those missing (FilterColumnNames) and then concatenates those remaining into an OR (||) string. More...
 
def GetFlagString (self, flagList=GetStandardFlags())
 Checks input list for missing flags and drops those missing (FilterColumnNames) and then concatenates those remaining into an AND string. More...
 
def GetFileName (self)
 Get input file name. More...
 
def Cut (self, name, cuts, node=None, nodetype=None)
 Apply a cut/filter to a provided node or the ActiveNode by default. More...
 
def Define (self, name, variables, node=None, nodetype=None)
 Defines a variable/column on top of a provided node or the ActiveNode by default. More...
 
def Apply (self, actionGroupList, node=None, trackEach=True)
 Applies a single CutGroup/VarGroup or an ordered list of Groups to the provided node or the ActiveNode by default. More...
 
def Discriminate (self, name, discriminator, node=None, passAsActiveNode=None)
 Forks a node based upon a discriminator being True or False (ActiveNode by default). More...
 
def SubCollection (self, name, basecoll, condition, skip=[])
 Creates a collection of a current collection (from a NanoAOD-like format) where the array-type branch is slimmed based on some selection. More...
 
def ReorderCollection (self, name, basecoll, newOrderCol, skip=[])
 Reorders a collection (from a NanoAOD-like format) where the new order is another column of vectors with the new indices specified. More...
 
def ObjectFromCollection (self, name, basecoll, index, skip=[])
 Similar to creating a SubCollection except the newly defined columns are single values (not vectors/arrays) for the object at the provided index. More...
 
def MergeCollections (self, name, collectionNames)
 Merge collections (provided by list of names in collectionNames) into one called name. More...
 
def CommonVars (self, collections)
 Find the common variables between collections. More...
 
def AddCorrection (self, correction, evalArgs={}, node=None)
 Add a Correction to track. More...
 
def AddCorrections (self, correctionList, node=None)
 Add multiple Corrections to track. More...
 
def MakeWeightCols (self, name='', node=None, correctionNames=None, dropList=[])
 Makes columns/variables to store total weights based on the Corrections that have been added. More...
 
def GetWeightName (self, corr, variation, name="")
 Return the branch/column name of the requested weight. More...
 
def MakeTemplateHistos (self, templateHist, variables, node=None)
 Generates the uncertainty template histograms based on the weights created by MakeWeightCols(). More...
 
def DrawTemplates (self, hGroup, saveLocation, projection='X', projectionArgs=(), fileType='pdf')
 Draw the template uncertainty histograms created by MakeTemplateHistos(). More...
 
def CalibrateVars (self, varCalibDict, evalArgs, newCollectionName, variationsFlag=True, node=None)
 Calibrate variables (all of the same collection - ex. More...
 
def Nminus1 (self, cutgroup, node=None)
 Create an N-1 tree structure of nodes building off of node with the N cuts from cutgroup. More...
 
def PrintNodeTree (self, outfilename, verbose=False, toSkip=[])
 Print a PDF image of the node structure of the analysis. More...
 
def MakeHistsWithBinning (self, histDict, name='', weight=None)
 Batch creates histograms at the current ActiveNode based on the input histDict which is formatted as {[<column name>]: <binning tuple>} where [<column name>] is a list of column names that you'd like to plot against each other in [x,y,z] order and binning_tuple is the set of arguments that would normally be passed to TH1. More...
 

Public Attributes

 fileName
 Path of the input file.
 
 silent
 bool More...
 
 RunChain
 ROOT.TChain. More...
 
 BaseNode
 Node. More...
 
 AllNodes
 {str:Node} More...
 
 Corrections
 dict More...
 
 isData
 bool More...
 
 preV6
 bool More...
 
 genEventCount
 int More...
 
 lhaid
 int More...
 
 ActiveNode
 Node. More...
 

Detailed Description

Main class for TIMBER.

Implements an interface with ROOT's RDataFrame (RDF). The default values assume the data is in NanoAOD format. However, any TTree can be used. The class works on the basis of nodes and actions where nodes are an RDF instance and an action (or series of actions) can transform the RDF to create a new node(s).

When using class functions to perform actions, an active node will always be tracked so that the next action uses the active node and assigns the output node as the new ActiveNode

Constructor & Destructor Documentation

◆ __init__()

def __init__ (   self,
  fileName,
  eventsTreeName = "Events",
  runTreeName = "Runs",
  createAllCollections = False 
)

Constructor.

Sets up the tracking of actions on an RDataFrame as nodes. Also looks up and stores common information in NanoAOD such as the number of generated events in a file (genEventCount), the LHA ID of the PDF set in the LHEPdfWeights branch (lhaid), if the file is data (isData), and if the file is before NanoAOD version 6 (preV6).

Parameters
fileName(str): A ROOT file path, a path to a txt file which contains several ROOT file paths separated by new line characters, or a list of either .root and/or .txt files.
eventsTreeName(str, optional): Name of TTree in fileName where events are stored. Defaults to "Events" (for NanoAOD)
runTreeName(str, optional): Name of TTree in fileName where run information is stored (for generated event info in simulation). Defaults to "Runs" (for NanoAOD)
createAllCollections(str, optional): Create all of the collection structs immediately. This consumes memory no matter what and the collections will increase processing times compared to accessing column values directly. Defaults to False.

Member Function Documentation

◆ __str__()

def __str__ (   self)

Call with print(<analyzer>) to print a nicely formatted description of the analyzer object for debugging.

Returns
str

◆ AddCorrection()

def AddCorrection (   self,
  correction,
  evalArgs = {},
  node = None 
)

Add a Correction to track.

Sets new active node with all correction variations calculated as new columns.

Parameters
correction(Correction): Correction object to add.
evalArgs(dict, optional): Dict with keys as C++ method argument names and values as the actual argument to provide (branch/column names) for per-event evaluation. For any argument names where a key is not provided, will attempt to find branch/column that already matches based on name.
node(Node, optional): Node to add correction on top of. Defaults to ActiveNode.
Exceptions
TypeErrorIf argument types are not Node and Correction.
ValueErrorIf Correction type is not a weight or uncertainty.
Returns
Node New ActiveNode.

◆ AddCorrections()

def AddCorrections (   self,
  correctionList,
  node = None 
)

Add multiple Corrections to track.

Sets new ActiveNode with all correction variations calculated as new columns.

Parameters
correctionList([Correction]): List of Correction objects to add.
node(Node, optional): [description]. Defaults to None.
Returns
Node New ActiveNode.

◆ Apply()

def Apply (   self,
  actionGroupList,
  node = None,
  trackEach = True 
)

Applies a single CutGroup/VarGroup or an ordered list of Groups to the provided node or the ActiveNode by default.

Parameters
actionGroupList(Group, list(Group)): The CutGroup or VarGroup to act on node or a list of CutGroups or VarGroups to act (in order) on node.
node([type], optional): Node to create the new variable/column on top of. Must be of type Node (not RDataFrame). Defaults to ActiveNode.
trackEach(bool, optional): [description]. Defaults to True.
Exceptions
TypeErrorIf argument type is not Node.
Returns
Node New ActiveNode.

◆ CalibrateVars()

def CalibrateVars (   self,
  varCalibDict,
  evalArgs,
  newCollectionName,
  variationsFlag = True,
  node = None 
)

Calibrate variables (all of the same collection - ex.

"FatJet") with the Calibrations provided in varCalibDict and arguments provided in evalArgs. Create a new collection with the calibrations applied and any re-ordering of the collection applied. As an example...

a = analyzer(...)
jes = Calibration("JES","TIMBER/Framework/include/JES_weight.h",
[GetJMETag("JES",str(year),"MC"),"AK8PFPuppi","","true"], corrtype="Calibration")
jer = Calibration("JER","TIMBER/Framework/include/JER_weight.h",
[GetJMETag("JER",str(year),"MC"),"AK8PFPuppi"], corrtype="Calibration")
...
varCalibDict = {
"FatJet_pt" : [jes, jer]
"FatJet_mass" : [jes, jer]
}
evalArgs = {
jes {"jets":"FatJets"},
jer {"jets":"FatJets","genJets":"GenJets"}
},
a.CalibrateVars(varCalibDict,evalArgs,"CorrectedFatJets")

This will apply the JES and JER calibrations and their four variations (up,down pair for each) to FatJet_pt and FatJet_mass branches and create a new collection called "CorrectedFatJets" which will be ordered by the new pt values. Note that if you want to correct a different collection (ex. AK4 based Jet collection), you need a separate payload and separate call to CalibrateVars because only one collection can be generated at a time. Also note that in this example, jes and jer are initialized with the AK8PFPuppi jets in mind. So if you'd like to apply the JES or JER calibrations to AK4 jets, you would also need to define objects like jesAK4 and jerAK4.

The calibrations will always be calculated as a seperate column which stores a vector named <CalibName>__vec and ordered {nominal, up, down} where "up" and "down" are the absolute weights (ie. not relative to "nominal"). If you'd just like the weights and do not want them applied to any variable, you can provide an empty dictionary ({}) for the varCalibDict argument.

This method will set the new active node to the one with the new collection defined.

Parameters
varCalibDict(dict): Dictionary mapping variable to calibrate to calibrations to apply. Best understood through the example above.
evalArgs(dict): Dictionary mapping calibrations to input evaluation arguments that map the C++ method definition argument names to the desired input.
newCollectionName(str): Output collection name.
variationsFlag(bool): If True, calculate systematic variations. If False, do not calculate variations. Defaults to True.
node(Node, optional): Node to add correction on top of. Defaults to ActiveNode.
Exceptions
TypeErrorIf argument types are not Node and Correction.
ValueErrorIf Correction type is not a weight or uncertainty.
Returns
Node New ActiveNode.

◆ Close()

def Close (   self)

Safely deletes analyzer instance.

Returns
None

◆ CommonVars()

def CommonVars (   self,
  collections 
)

Find the common variables between collections.

Parameters
collections([str]): List of collections names (not branch names).
Returns
[str]: List of variables shared among the collections.

◆ Cut()

def Cut (   self,
  name,
  cuts,
  node = None,
  nodetype = None 
)

Apply a cut/filter to a provided node or the ActiveNode by default.

Will add the resulting node to tracking and set it as the ActiveNode.

Parameters
name(str): Name for the cut for internal tracking and later reference.
cuts(str, CutGroup): A one-line C++ string that evaluates as a bool or a CutGroup object which contains multiple actions that evaluate as bools.
node(Node, optional): Node on which to apply the cut/filter. Defaults to ActiveNode.
nodetype(str, optional): Defaults to None in which case the new Node will be type "Define".
Exceptions
TypeErrorIf argument type is not Node.
Returns
Node New ActiveNode.

◆ DataFrame()

def DataFrame (   self)

DataFrame of the ActiveNode.

Returns
RDataFrame Dataframe for the active node.

◆ Define()

def Define (   self,
  name,
  variables,
  node = None,
  nodetype = None 
)

Defines a variable/column on top of a provided node or the ActiveNode by default.

Will add the resulting node to tracking and set it as the ActiveNode.

Parameters
name(str): Name for the column for internal tracking and later reference.
variables(str, VarGroup): A one-line C++ string that evaluates to desired value to store or a VarGroup object which contains multiple actions that evaluate to the desired values.
node(Node, optional): Node to create the new variable/column on top of. Defaults to ActiveNode.
nodetype(str, optional): Defaults to None in which case the new Node will be type "Define".
Exceptions
TypeErrorIf argument type is not Node.
Returns
Node New ActiveNode.

◆ Discriminate()

def Discriminate (   self,
  name,
  discriminator,
  node = None,
  passAsActiveNode = None 
)

Forks a node based upon a discriminator being True or False (ActiveNode by default).

Parameters
name(str): Name for the discrimination for internal tracking and later reference.
discriminator(str): A one-line C++ string that evaluates as a bool to discriminate for the forking of the node.
node(Node, optional): Node to discriminate. Must be of type Node (not RDataFrame). Defaults to ActiveNode.
passAsActiveNode(bool, optional): True if the ActiveNode should be set to the node that passes the discriminator. False if the ActiveNode should be set to the node that fails the discriminator. Defaults to None in which case the ActiveNode does not change.
Returns
dict Dictionary with keys "pass" and "fail" corresponding to the passing and failing Nodes stored as values.

◆ DrawTemplates()

def DrawTemplates (   self,
  hGroup,
  saveLocation,
  projection = 'X',
  projectionArgs = (),
  fileType = 'pdf' 
)

Draw the template uncertainty histograms created by MakeTemplateHistos().

Parameters
hGroup(HistGroup): Uncertainty template histograms.
saveLocation(str): Path to folder to save histograms.
projection(str, optional): "X" (Default), "Y", or "Z". Axis to project onto if templates are not 1D.
projectionArgs(tuple, optional): A tuple of arguments provided to ROOT TH1 ProjectionX(Y)(Z).
fileType(str, optional): File type - "pdf", "png", etc (must be supported by TCanvas.Print()).
Returns
None

◆ FilterColumnNames()

def FilterColumnNames (   self,
  columns,
  node = None 
)

Takes a list of possible columns and returns only those that exist in the RDataFrame of the supplied node.

Parameters
columns([str]): List of column names (str)
node(Node, optional): Node to compare against. Defaults to BaseNode.
Returns
[str]: List of column names that union with those in the RDataFrame.

◆ GetActiveNode()

def GetActiveNode (   self)

Get the active node.

Returns
Node Value of ActiveNode.

◆ GetBaseNode()

def GetBaseNode (   self)

Get the base node.

Returns
Node Value of BaseNode.

◆ GetCollectionNames()

def GetCollectionNames (   self)

Return a list of all collections that currently exist (including those that have been added).

Returns
list Collection names.

◆ GetCorrectionNames()

def GetCorrectionNames (   self)

Get names of all corrections being tracked.

Returns
[str]: List of Correction keys/names.

◆ GetFileName()

def GetFileName (   self)

Get input file name.

Returns
str File name

◆ GetFlagString()

def GetFlagString (   self,
  flagList = GetStandardFlags() 
)

Checks input list for missing flags and drops those missing (FilterColumnNames) and then concatenates those remaining into an AND string.

Parameters
flagList[str]: List of flag names
Returns
str Statement to evaluate as the set of flags.

◆ GetTrackedNodeNames()

def GetTrackedNodeNames (   self)

Gets the names of the nodes currently being tracked.

Returns
[str]: List of names of nodes being tracked.

◆ GetTriggerString()

def GetTriggerString (   self,
  trigList 
)

Checks input list for missing triggers and drops those missing (FilterColumnNames) and then concatenates those remaining into an OR (||) string.

Parameters
trigList[str]: List of trigger names
Returns
str Statement to evaluate as the set of triggers.

◆ GetWeightName()

def GetWeightName (   self,
  corr,
  variation,
  name = "" 
)

Return the branch/column name of the requested weight.

Parameters
corr(str,Correction): Either the correction object or the name of the correction.
variation(str): "up" or "down".
name(str,optional): Name given MakeWeightCols to denote group of weight columns. Defaults to "".
Exceptions
NameErrorIf weight name does not exist in the columns.
Returns
str Name of the requested weight branch/column.

◆ MakeHistsWithBinning()

def MakeHistsWithBinning (   self,
  histDict,
  name = '',
  weight = None 
)

Batch creates histograms at the current ActiveNode based on the input histDict which is formatted as {[<column name>]: <binning tuple>} where [<column name>] is a list of column names that you'd like to plot against each other in [x,y,z] order and binning_tuple is the set of arguments that would normally be passed to TH1.

The dimensions of the returned histograms are determined based on the size of [<column name>].

Parameters
histDict({std:tuple}): formatted as {<column name>: <binning tuple>} where binning_tuple are the arguments that would normally be passed to TH1. Size determines dimension of histogram.
name(str, optional): Name for the output HistGroup. Defaults to '' in which case the name of the ActiveNode will be used.
weight(str, optional): Weight (as a string) to apply to all histograms. Defaults to None.
Returns
dict Dictionary with same structure as the input (column names for keys) with new histograms evaluated on the ActiveNode as the values.

◆ MakeTemplateHistos()

def MakeTemplateHistos (   self,
  templateHist,
  variables,
  node = None 
)

Generates the uncertainty template histograms based on the weights created by MakeWeightCols().

Parameters
templateHist(TH1,TH2,TH3): A TH1, TH2, or TH3 used as a template to create the histograms.
variables([str]): A list of the columns/variables to plot (ex. ["x","y","z"]).
node(Node): Node to plot histograms from. Defaults to ActiveNode.
Returns
HistGroup Uncertainty template histograms.

◆ MakeWeightCols()

def MakeWeightCols (   self,
  name = '',
  node = None,
  correctionNames = None,
  dropList = [] 
)

Makes columns/variables to store total weights based on the Corrections that have been added.

This function automates the calculation of the columns that store the nominal weight and the variation of weights based on the corrections in consideration. The nominal weight will be the product of all "weight" type Correction objects. The variations on the nominal weight correspond to the variations/uncertainties in each Correction object (both "weight" and "uncert" types).

For example, if there are five different Correction objects considered, there will be 11 weights calculated (nominal + 5 up + 5 down).

A list of correction names can be provided if only a subset of the corrections being tracked are desired. A drop list can also be supplied to remove a subset of corrections.

Parameters
name(str): Name for group of weights so as not to duplicate weight columns if running method multiple times. Output columns will have suffix weight_<name>__. Defaults to '' with suffix weight__.
node(Node): Node to calculate weights on top of. Must be of type Node (not RDataFrame). Defaults to ActiveNode.
correctionNameslist(str): List of correction names (strings) to consider. Default is None in which case all corrections being tracked are considered.
dropListlist(str): List of correction names (strings) to not consider. Default is empty lists in which case no corrections are dropped from consideration.
Returns
Node New ActiveNode.

◆ MergeCollections()

def MergeCollections (   self,
  name,
  collectionNames 
)

Merge collections (provided by list of names in collectionNames) into one called name.

Only common variables are taken and stored in the new collection.

Parameters
name(str): Name of new collection
collectionNames([str]): List of names of collections to merge.

Examples

a = analyzer(<...>)
a.MergeCollections("Lepton",["Electron","Muon"])

◆ Nminus1()

def Nminus1 (   self,
  cutgroup,
  node = None 
)

Create an N-1 tree structure of nodes building off of node with the N cuts from cutgroup.

The structure is optimized so that as many actions are shared as possible so that the N different nodes can be made. Use PrintNodeTree() to visualize.

Parameters
cutgroup(CutGroup): Group of N cuts to apply.
node(Node, optional): Node to build on. Defaults to ActiveNode.
Returns
dict N nodes in dictionary with keys indicating the cut that was not applied.

◆ ObjectFromCollection()

def ObjectFromCollection (   self,
  name,
  basecoll,
  index,
  skip = [] 
)

Similar to creating a SubCollection except the newly defined columns are single values (not vectors/arrays) for the object at the provided index.

Parameters
name(str): Name of new collection.
basecoll(str): Name of derivative collection.
index(str): Index of the collection item to extract.
skip([str]): List of variable names in the collection to skip.
Returns
None. New nodes created with the sub collection.

Examples

ObjectFromCollection('LeadJet','FatJet','0')

◆ PrintNodeTree()

def PrintNodeTree (   self,
  outfilename,
  verbose = False,
  toSkip = [] 
)

Print a PDF image of the node structure of the analysis.

Requires python graphviz package which should be an installed dependency.

Parameters
outfilename(str): Name of output PDF file.
verbose(bool, optional): Turns on verbose node labels. Defaults to False.
toSkip([], optional): Skip list of types of nodes (with sub-string matching so providing "Define" will cut out all definitions). Possible options are "Define", "Cut", "Correction", "MergeDefine", and "SubCollDefine". Defaults to empty list.
Returns
None

◆ Range()

def Range (   self,
  argv 
)
See also
Node::Range

◆ ReorderCollection()

def ReorderCollection (   self,
  name,
  basecoll,
  newOrderCol,
  skip = [] 
)

Reorders a collection (from a NanoAOD-like format) where the new order is another column of vectors with the new indices specified.

Parameters
name(str): Name of new collection.
basecoll(str): Name of derivative collection.
newOrderCol(str): Order for the new collection (stored as column in RDataFrame).
skip([str]): List of variable names in the collection to skip.
Returns
Node New ActiveNode.

Examples

a = analyzer(...)

a.Define('NewFatJetIdxs','ReorderJets(...)')
a.ReorderCollection('ReorderedFatJets','FatJet','NewFatJetIdxs')

◆ SaveRunChain()

def SaveRunChain (   self,
  filename,
  merge = True 
)

Save the Run tree (chain of all input files) to filename.

If filename already exists, some staging will occur to properly merge the files via hadd.

Parameters
filename(str): Output file name.
merge(bool, optional): Whether to merge with a file that already exists. Defaults to True.
Returns
None

◆ SetActiveNode()

def SetActiveNode (   self,
  node 
)

Sets the active node.

Parameters
node(Node): Node to set as ActiveNode.
Exceptions
ValueErrorIf argument type is not Node.
Returns
Node New ActiveNode.

◆ Snapshot()

def Snapshot (   self,
  columns,
  outfilename,
  treename,
  lazy = False,
  openOption = 'RECREATE' 
)
See also
Node::Snapshot

◆ SubCollection()

def SubCollection (   self,
  name,
  basecoll,
  condition,
  skip = [] 
)

Creates a collection of a current collection (from a NanoAOD-like format) where the array-type branch is slimmed based on some selection.

Parameters
name(str): Name of new collection.
basecoll(str): Name of derivative collection.
condition(str): C++ condition that determines which items to keep.
skip([str]): List of variable names in the collection to skip.
Returns
Node New ActiveNode.

Examples

SubCollection('TopJets','FatJet','FatJet_msoftdrop > 105 && FatJet_msoftdrop < 220')

◆ TrackNode()

def TrackNode (   self,
  node 
)

Add a node to track.

Will add the node to AllNodes dictionary with key node.name.

Parameters
node(Node): Node to start tracking.
Exceptions
NameErrorIf attempting to track nodes of the same name.
TypeErrorIf argument type is not Node.
Returns
None

Member Data Documentation

◆ ActiveNode

ActiveNode

Node.

Active node. Access via GetActiveNode(). Set via SetActiveNode().

◆ AllNodes

AllNodes

{str:Node}

List of all nodes being tracked.

◆ BaseNode

BaseNode

Node.

Initial Node - no modifications.

◆ Corrections

Corrections

dict

All corrections added to track.

◆ genEventCount

genEventCount

int

Number of generated events in imported simulation files. Zero if not found or data.

◆ isData

isData

bool

Is data (true) or simulation (false) based on existence of _genEventCount branch.

◆ lhaid

lhaid

int

LHA ID of the PDF weight set in the NanoAOD derived from the LHEPdfWeight branch title. -1 if not found or data.

◆ preV6

preV6

bool

Is pre-NanoAODv6 (true) or not (false) based on existence of _genEventCount branch.

◆ RunChain

RunChain

ROOT.TChain.

The TChain of the <runTreeName> TTree.

◆ silent

silent

bool

Silences the verbose output of each Cut and Define. Defaults to False but can be modified by the user at will.


The documentation for this class was generated from the following file: