![]() |
TIMBER
beta
Tree Interface for Making Binned Events with RDataFrame
|
Main class for TIMBER. More...


Public Member Functions | |
| def | __init__ (self, fileName, eventsTreeName="Events", runTreeName="Runs", createAllCollections=False) |
| Constructor. More... | |
| def | Close (self) |
| Safely deletes analyzer instance. More... | |
| def | __str__ (self) |
Call with print(<analyzer>) to print a nicely formatted description of the analyzer object for debugging. More... | |
| def | DataFrame (self) |
| DataFrame of the ActiveNode. More... | |
| def | Snapshot (self, columns, outfilename, treename, lazy=False, openOption='RECREATE') |
| def | SaveRunChain (self, filename, merge=True) |
| Save the Run tree (chain of all input files) to filename. More... | |
| def | Range (self, argv) |
| def | GetCollectionNames (self) |
| Return a list of all collections that currently exist (including those that have been added). More... | |
| def | SetActiveNode (self, node) |
| Sets the active node. More... | |
| def | GetActiveNode (self) |
| Get the active node. More... | |
| def | GetBaseNode (self) |
| Get the base node. More... | |
| def | TrackNode (self, node) |
| Add a node to track. More... | |
| def | GetTrackedNodeNames (self) |
| Gets the names of the nodes currently being tracked. More... | |
| def | GetCorrectionNames (self) |
| Get names of all corrections being tracked. More... | |
| def | FilterColumnNames (self, columns, node=None) |
| Takes a list of possible columns and returns only those that exist in the RDataFrame of the supplied node. More... | |
| def | GetTriggerString (self, trigList) |
Checks input list for missing triggers and drops those missing (FilterColumnNames) and then concatenates those remaining into an OR (||) string. More... | |
| def | GetFlagString (self, flagList=GetStandardFlags()) |
| Checks input list for missing flags and drops those missing (FilterColumnNames) and then concatenates those remaining into an AND string. More... | |
| def | GetFileName (self) |
| Get input file name. More... | |
| def | Cut (self, name, cuts, node=None, nodetype=None) |
| Apply a cut/filter to a provided node or the ActiveNode by default. More... | |
| def | Define (self, name, variables, node=None, nodetype=None) |
| Defines a variable/column on top of a provided node or the ActiveNode by default. More... | |
| def | Apply (self, actionGroupList, node=None, trackEach=True) |
| Applies a single CutGroup/VarGroup or an ordered list of Groups to the provided node or the ActiveNode by default. More... | |
| def | Discriminate (self, name, discriminator, node=None, passAsActiveNode=None) |
| Forks a node based upon a discriminator being True or False (ActiveNode by default). More... | |
| def | SubCollection (self, name, basecoll, condition, skip=[]) |
| Creates a collection of a current collection (from a NanoAOD-like format) where the array-type branch is slimmed based on some selection. More... | |
| def | ReorderCollection (self, name, basecoll, newOrderCol, skip=[]) |
| Reorders a collection (from a NanoAOD-like format) where the new order is another column of vectors with the new indices specified. More... | |
| def | ObjectFromCollection (self, name, basecoll, index, skip=[]) |
| Similar to creating a SubCollection except the newly defined columns are single values (not vectors/arrays) for the object at the provided index. More... | |
| def | MergeCollections (self, name, collectionNames) |
Merge collections (provided by list of names in collectionNames) into one called name. More... | |
| def | CommonVars (self, collections) |
| Find the common variables between collections. More... | |
| def | AddCorrection (self, correction, evalArgs={}, node=None) |
| Add a Correction to track. More... | |
| def | AddCorrections (self, correctionList, node=None) |
| Add multiple Corrections to track. More... | |
| def | MakeWeightCols (self, name='', node=None, correctionNames=None, dropList=[]) |
| Makes columns/variables to store total weights based on the Corrections that have been added. More... | |
| def | GetWeightName (self, corr, variation, name="") |
| Return the branch/column name of the requested weight. More... | |
| def | MakeTemplateHistos (self, templateHist, variables, node=None) |
| Generates the uncertainty template histograms based on the weights created by MakeWeightCols(). More... | |
| def | DrawTemplates (self, hGroup, saveLocation, projection='X', projectionArgs=(), fileType='pdf') |
| Draw the template uncertainty histograms created by MakeTemplateHistos(). More... | |
| def | CalibrateVars (self, varCalibDict, evalArgs, newCollectionName, variationsFlag=True, node=None) |
| Calibrate variables (all of the same collection - ex. More... | |
| def | Nminus1 (self, cutgroup, node=None) |
Create an N-1 tree structure of nodes building off of node with the N cuts from cutgroup. More... | |
| def | PrintNodeTree (self, outfilename, verbose=False, toSkip=[]) |
| Print a PDF image of the node structure of the analysis. More... | |
| def | MakeHistsWithBinning (self, histDict, name='', weight=None) |
Batch creates histograms at the current ActiveNode based on the input histDict which is formatted as {[<column name>]: <binning tuple>} where [<column name>] is a list of column names that you'd like to plot against each other in [x,y,z] order and binning_tuple is the set of arguments that would normally be passed to TH1. More... | |
Public Attributes | |
| fileName | |
| Path of the input file. | |
| silent | |
| bool More... | |
| RunChain | |
| ROOT.TChain. More... | |
| BaseNode | |
| Node. More... | |
| AllNodes | |
| {str:Node} More... | |
| Corrections | |
| dict More... | |
| isData | |
| bool More... | |
| preV6 | |
| bool More... | |
| genEventCount | |
| int More... | |
| lhaid | |
| int More... | |
| ActiveNode | |
| Node. More... | |
Main class for TIMBER.
Implements an interface with ROOT's RDataFrame (RDF). The default values assume the data is in NanoAOD format. However, any TTree can be used. The class works on the basis of nodes and actions where nodes are an RDF instance and an action (or series of actions) can transform the RDF to create a new node(s).
When using class functions to perform actions, an active node will always be tracked so that the next action uses the active node and assigns the output node as the new ActiveNode
| def __init__ | ( | self, | |
| fileName, | |||
eventsTreeName = "Events", |
|||
runTreeName = "Runs", |
|||
createAllCollections = False |
|||
| ) |
Constructor.
Sets up the tracking of actions on an RDataFrame as nodes. Also looks up and stores common information in NanoAOD such as the number of generated events in a file (genEventCount), the LHA ID of the PDF set in the LHEPdfWeights branch (lhaid), if the file is data (isData), and if the file is before NanoAOD version 6 (preV6).
| fileName | (str): A ROOT file path, a path to a txt file which contains several ROOT file paths separated by new line characters, or a list of either .root and/or .txt files. |
| eventsTreeName | (str, optional): Name of TTree in fileName where events are stored. Defaults to "Events" (for NanoAOD) |
| runTreeName | (str, optional): Name of TTree in fileName where run information is stored (for generated event info in simulation). Defaults to "Runs" (for NanoAOD) |
| createAllCollections | (str, optional): Create all of the collection structs immediately. This consumes memory no matter what and the collections will increase processing times compared to accessing column values directly. Defaults to False. |
| def __str__ | ( | self | ) |
Call with print(<analyzer>) to print a nicely formatted description of the analyzer object for debugging.
| def AddCorrection | ( | self, | |
| correction, | |||
evalArgs = {}, |
|||
node = None |
|||
| ) |
Add a Correction to track.
Sets new active node with all correction variations calculated as new columns.
| correction | (Correction): Correction object to add. |
| evalArgs | (dict, optional): Dict with keys as C++ method argument names and values as the actual argument to provide (branch/column names) for per-event evaluation. For any argument names where a key is not provided, will attempt to find branch/column that already matches based on name. |
| node | (Node, optional): Node to add correction on top of. Defaults to ActiveNode. |
| TypeError | If argument types are not Node and Correction. |
| ValueError | If Correction type is not a weight or uncertainty. |
| def AddCorrections | ( | self, | |
| correctionList, | |||
node = None |
|||
| ) |
Add multiple Corrections to track.
Sets new ActiveNode with all correction variations calculated as new columns.
| correctionList | ([Correction]): List of Correction objects to add. |
| node | (Node, optional): [description]. Defaults to None. |
| def Apply | ( | self, | |
| actionGroupList, | |||
node = None, |
|||
trackEach = True |
|||
| ) |
Applies a single CutGroup/VarGroup or an ordered list of Groups to the provided node or the ActiveNode by default.
| actionGroupList | (Group, list(Group)): The CutGroup or VarGroup to act on node or a list of CutGroups or VarGroups to act (in order) on node. |
| node | ([type], optional): Node to create the new variable/column on top of. Must be of type Node (not RDataFrame). Defaults to ActiveNode. |
| trackEach | (bool, optional): [description]. Defaults to True. |
| TypeError | If argument type is not Node. |
| def CalibrateVars | ( | self, | |
| varCalibDict, | |||
| evalArgs, | |||
| newCollectionName, | |||
variationsFlag = True, |
|||
node = None |
|||
| ) |
Calibrate variables (all of the same collection - ex.
"FatJet") with the Calibrations provided in varCalibDict and arguments provided in evalArgs. Create a new collection with the calibrations applied and any re-ordering of the collection applied. As an example...
This will apply the JES and JER calibrations and their four variations (up,down pair for each) to FatJet_pt and FatJet_mass branches and create a new collection called "CorrectedFatJets" which will be ordered by the new pt values. Note that if you want to correct a different collection (ex. AK4 based Jet collection), you need a separate payload and separate call to CalibrateVars because only one collection can be generated at a time. Also note that in this example, jes and jer are initialized with the AK8PFPuppi jets in mind. So if you'd like to apply the JES or JER calibrations to AK4 jets, you would also need to define objects like jesAK4 and jerAK4.
The calibrations will always be calculated as a seperate column which stores a vector named <CalibName>__vec and ordered {nominal, up, down} where "up" and "down" are the absolute weights (ie. not relative to "nominal"). If you'd just like the weights and do not want them applied to any variable, you can provide an empty dictionary ({}) for the varCalibDict argument.
This method will set the new active node to the one with the new collection defined.
| varCalibDict | (dict): Dictionary mapping variable to calibrate to calibrations to apply. Best understood through the example above. |
| evalArgs | (dict): Dictionary mapping calibrations to input evaluation arguments that map the C++ method definition argument names to the desired input. |
| newCollectionName | (str): Output collection name. |
| variationsFlag | (bool): If True, calculate systematic variations. If False, do not calculate variations. Defaults to True. |
| node | (Node, optional): Node to add correction on top of. Defaults to ActiveNode. |
| TypeError | If argument types are not Node and Correction. |
| ValueError | If Correction type is not a weight or uncertainty. |
| def Close | ( | self | ) |
Safely deletes analyzer instance.
| def CommonVars | ( | self, | |
| collections | |||
| ) |
Find the common variables between collections.
| collections | ([str]): List of collections names (not branch names). |
| def Cut | ( | self, | |
| name, | |||
| cuts, | |||
node = None, |
|||
nodetype = None |
|||
| ) |
Apply a cut/filter to a provided node or the ActiveNode by default.
Will add the resulting node to tracking and set it as the ActiveNode.
| name | (str): Name for the cut for internal tracking and later reference. |
| cuts | (str, CutGroup): A one-line C++ string that evaluates as a bool or a CutGroup object which contains multiple actions that evaluate as bools. |
| node | (Node, optional): Node on which to apply the cut/filter. Defaults to ActiveNode. |
| nodetype | (str, optional): Defaults to None in which case the new Node will be type "Define". |
| TypeError | If argument type is not Node. |
| def DataFrame | ( | self | ) |
DataFrame of the ActiveNode.
| def Define | ( | self, | |
| name, | |||
| variables, | |||
node = None, |
|||
nodetype = None |
|||
| ) |
Defines a variable/column on top of a provided node or the ActiveNode by default.
Will add the resulting node to tracking and set it as the ActiveNode.
| name | (str): Name for the column for internal tracking and later reference. |
| variables | (str, VarGroup): A one-line C++ string that evaluates to desired value to store or a VarGroup object which contains multiple actions that evaluate to the desired values. |
| node | (Node, optional): Node to create the new variable/column on top of. Defaults to ActiveNode. |
| nodetype | (str, optional): Defaults to None in which case the new Node will be type "Define". |
| TypeError | If argument type is not Node. |
| def Discriminate | ( | self, | |
| name, | |||
| discriminator, | |||
node = None, |
|||
passAsActiveNode = None |
|||
| ) |
Forks a node based upon a discriminator being True or False (ActiveNode by default).
| name | (str): Name for the discrimination for internal tracking and later reference. |
| discriminator | (str): A one-line C++ string that evaluates as a bool to discriminate for the forking of the node. |
| node | (Node, optional): Node to discriminate. Must be of type Node (not RDataFrame). Defaults to ActiveNode. |
| passAsActiveNode | (bool, optional): True if the ActiveNode should be set to the node that passes the discriminator. False if the ActiveNode should be set to the node that fails the discriminator. Defaults to None in which case the ActiveNode does not change. |
| def DrawTemplates | ( | self, | |
| hGroup, | |||
| saveLocation, | |||
projection = 'X', |
|||
projectionArgs = (), |
|||
fileType = 'pdf' |
|||
| ) |
Draw the template uncertainty histograms created by MakeTemplateHistos().
| hGroup | (HistGroup): Uncertainty template histograms. |
| saveLocation | (str): Path to folder to save histograms. |
| projection | (str, optional): "X" (Default), "Y", or "Z". Axis to project onto if templates are not 1D. |
| projectionArgs | (tuple, optional): A tuple of arguments provided to ROOT TH1 ProjectionX(Y)(Z). |
| fileType | (str, optional): File type - "pdf", "png", etc (must be supported by TCanvas.Print()). |
| def FilterColumnNames | ( | self, | |
| columns, | |||
node = None |
|||
| ) |
Takes a list of possible columns and returns only those that exist in the RDataFrame of the supplied node.
| columns | ([str]): List of column names (str) |
| node | (Node, optional): Node to compare against. Defaults to BaseNode. |
| def GetActiveNode | ( | self | ) |
Get the active node.
| def GetCollectionNames | ( | self | ) |
Return a list of all collections that currently exist (including those that have been added).
| def GetCorrectionNames | ( | self | ) |
Get names of all corrections being tracked.
| def GetFileName | ( | self | ) |
Get input file name.
| def GetFlagString | ( | self, | |
flagList = GetStandardFlags() |
|||
| ) |
Checks input list for missing flags and drops those missing (FilterColumnNames) and then concatenates those remaining into an AND string.
| flagList | [str]: List of flag names |
| def GetTrackedNodeNames | ( | self | ) |
Gets the names of the nodes currently being tracked.
| def GetTriggerString | ( | self, | |
| trigList | |||
| ) |
Checks input list for missing triggers and drops those missing (FilterColumnNames) and then concatenates those remaining into an OR (||) string.
| trigList | [str]: List of trigger names |
| def GetWeightName | ( | self, | |
| corr, | |||
| variation, | |||
name = "" |
|||
| ) |
Return the branch/column name of the requested weight.
| corr | (str,Correction): Either the correction object or the name of the correction. |
| variation | (str): "up" or "down". |
| name | (str,optional): Name given MakeWeightCols to denote group of weight columns. Defaults to "". |
| NameError | If weight name does not exist in the columns. |
| def MakeHistsWithBinning | ( | self, | |
| histDict, | |||
name = '', |
|||
weight = None |
|||
| ) |
Batch creates histograms at the current ActiveNode based on the input histDict which is formatted as {[<column name>]: <binning tuple>} where [<column name>] is a list of column names that you'd like to plot against each other in [x,y,z] order and binning_tuple is the set of arguments that would normally be passed to TH1.
The dimensions of the returned histograms are determined based on the size of [<column name>].
| histDict | ({std:tuple}): formatted as {<column name>: <binning tuple>} where binning_tuple are the arguments that would normally be passed to TH1. Size determines dimension of histogram. |
| name | (str, optional): Name for the output HistGroup. Defaults to '' in which case the name of the ActiveNode will be used. |
| weight | (str, optional): Weight (as a string) to apply to all histograms. Defaults to None. |
| def MakeTemplateHistos | ( | self, | |
| templateHist, | |||
| variables, | |||
node = None |
|||
| ) |
Generates the uncertainty template histograms based on the weights created by MakeWeightCols().
| templateHist | (TH1,TH2,TH3): A TH1, TH2, or TH3 used as a template to create the histograms. |
| variables | ([str]): A list of the columns/variables to plot (ex. ["x","y","z"]). |
| node | (Node): Node to plot histograms from. Defaults to ActiveNode. |
| def MakeWeightCols | ( | self, | |
name = '', |
|||
node = None, |
|||
correctionNames = None, |
|||
dropList = [] |
|||
| ) |
Makes columns/variables to store total weights based on the Corrections that have been added.
This function automates the calculation of the columns that store the nominal weight and the variation of weights based on the corrections in consideration. The nominal weight will be the product of all "weight" type Correction objects. The variations on the nominal weight correspond to the variations/uncertainties in each Correction object (both "weight" and "uncert" types).
For example, if there are five different Correction objects considered, there will be 11 weights calculated (nominal + 5 up + 5 down).
A list of correction names can be provided if only a subset of the corrections being tracked are desired. A drop list can also be supplied to remove a subset of corrections.
| name | (str): Name for group of weights so as not to duplicate weight columns if running method multiple times. Output columns will have suffix weight_<name>__. Defaults to '' with suffix weight__. |
| node | (Node): Node to calculate weights on top of. Must be of type Node (not RDataFrame). Defaults to ActiveNode. |
| correctionNames | list(str): List of correction names (strings) to consider. Default is None in which case all corrections being tracked are considered. |
| dropList | list(str): List of correction names (strings) to not consider. Default is empty lists in which case no corrections are dropped from consideration. |
| def MergeCollections | ( | self, | |
| name, | |||
| collectionNames | |||
| ) |
Merge collections (provided by list of names in collectionNames) into one called name.
Only common variables are taken and stored in the new collection.
| name | (str): Name of new collection |
| collectionNames | ([str]): List of names of collections to merge. |
Examples
| def Nminus1 | ( | self, | |
| cutgroup, | |||
node = None |
|||
| ) |
Create an N-1 tree structure of nodes building off of node with the N cuts from cutgroup.
The structure is optimized so that as many actions are shared as possible so that the N different nodes can be made. Use PrintNodeTree() to visualize.
| cutgroup | (CutGroup): Group of N cuts to apply. |
| node | (Node, optional): Node to build on. Defaults to ActiveNode. |
| def ObjectFromCollection | ( | self, | |
| name, | |||
| basecoll, | |||
| index, | |||
skip = [] |
|||
| ) |
Similar to creating a SubCollection except the newly defined columns are single values (not vectors/arrays) for the object at the provided index.
| name | (str): Name of new collection. |
| basecoll | (str): Name of derivative collection. |
| index | (str): Index of the collection item to extract. |
| skip | ([str]): List of variable names in the collection to skip. |
Examples
| def PrintNodeTree | ( | self, | |
| outfilename, | |||
verbose = False, |
|||
toSkip = [] |
|||
| ) |
Print a PDF image of the node structure of the analysis.
Requires python graphviz package which should be an installed dependency.
| outfilename | (str): Name of output PDF file. |
| verbose | (bool, optional): Turns on verbose node labels. Defaults to False. |
| toSkip | ([], optional): Skip list of types of nodes (with sub-string matching so providing "Define" will cut out all definitions). Possible options are "Define", "Cut", "Correction", "MergeDefine", and "SubCollDefine". Defaults to empty list. |
| def Range | ( | self, | |
| argv | |||
| ) |
| def ReorderCollection | ( | self, | |
| name, | |||
| basecoll, | |||
| newOrderCol, | |||
skip = [] |
|||
| ) |
Reorders a collection (from a NanoAOD-like format) where the new order is another column of vectors with the new indices specified.
| name | (str): Name of new collection. |
| basecoll | (str): Name of derivative collection. |
| newOrderCol | (str): Order for the new collection (stored as column in RDataFrame). |
| skip | ([str]): List of variable names in the collection to skip. |
Examples
a = analyzer(...)
| def SaveRunChain | ( | self, | |
| filename, | |||
merge = True |
|||
| ) |
Save the Run tree (chain of all input files) to filename.
If filename already exists, some staging will occur to properly merge the files via hadd.
| filename | (str): Output file name. |
| merge | (bool, optional): Whether to merge with a file that already exists. Defaults to True. |
| def SetActiveNode | ( | self, | |
| node | |||
| ) |
Sets the active node.
| node | (Node): Node to set as ActiveNode. |
| ValueError | If argument type is not Node. |
| def Snapshot | ( | self, | |
| columns, | |||
| outfilename, | |||
| treename, | |||
lazy = False, |
|||
openOption = 'RECREATE' |
|||
| ) |
| def SubCollection | ( | self, | |
| name, | |||
| basecoll, | |||
| condition, | |||
skip = [] |
|||
| ) |
Creates a collection of a current collection (from a NanoAOD-like format) where the array-type branch is slimmed based on some selection.
| name | (str): Name of new collection. |
| basecoll | (str): Name of derivative collection. |
| condition | (str): C++ condition that determines which items to keep. |
| skip | ([str]): List of variable names in the collection to skip. |
Examples
| def TrackNode | ( | self, | |
| node | |||
| ) |
| ActiveNode |
Node.
Active node. Access via GetActiveNode(). Set via SetActiveNode().
| AllNodes |
{str:Node}
List of all nodes being tracked.
| Corrections |
dict
All corrections added to track.
| genEventCount |
int
Number of generated events in imported simulation files. Zero if not found or data.
| isData |
bool
Is data (true) or simulation (false) based on existence of _genEventCount branch.
| lhaid |
int
LHA ID of the PDF weight set in the NanoAOD derived from the LHEPdfWeight branch title. -1 if not found or data.
| preV6 |
bool
Is pre-NanoAODv6 (true) or not (false) based on existence of _genEventCount branch.
| RunChain |
ROOT.TChain.
The TChain of the <runTreeName> TTree.
| silent |
bool
Silences the verbose output of each Cut and Define. Defaults to False but can be modified by the user at will.
1.8.13