Skip to content

Draft: Extended friend tree support to include friend trees with vector branches.

Henry Day-Hall requested to merge hdayhall/athena:clean_RDFxAODV2 into main

This is progress on the developments here

To summarise; The xAOD RDataFrame helpers allow RDataFrame to be used on a tree containing an xAOD. The above MR extends these tools so that additional non-xAOD friend trees can be attached and read with RDataFrame at the same time as the primary xAOD tree. This MR correctly handles vector branches in friend trees.

While using this branch in my own analysis I have discovered that it did not handle ROOT::VecOps::RVec branches. The memory cannot be guaranteed to be Contiguous, so it needs careful treatment. This limitation is present in RRootDS too. After some discussion on the root forum, I was advised that the behaviour I needed to emulate was that of RTreeColumnReader. This class checks if the vector on the branch is contiguous, and if so, makes a direct reference to it, otherwise copies it into a contiguous RVector.

Mechanics

Getting vectors from the TTreeReader, checking if contiguous, and memory management for any copies, is dealt with by TRVecReader. This is a templated subclass of pure virtual class TRVecReaderBase. The correct TRVecReader is generated by the Factory pattern, and returned as a unique_ptr<TRVecReaderBase>, so that inside RDataSource there is no need to differentiate between branch types. Only the factory function considers branch type explicitly.

One special case is bool vectors - these will always need copying due to the idiosyncrasies of std::vector<bool>.

We try to copy in a lazy way. Only columns for which GetColumnReaderImpl has been called are updated, however, as the code that calls GetColumnReadersImpl is external, we cannot guarantee that it will only request the readers it uses. There may be workarounds; see here, but for the time being the easiest solution is to not put more vectors in the friend tree than are actually needed.

Interface

Unchanged from previous MR.

import ROOT; ROOT.xAOD.Init(); ROOT.xAOD.JetContainer_v1()
from xAODDataSource import Helpers
# Primary tree contains kinematics
primary_glob = "/home/dayhahen/jetydaod/example_data/rucio/2022_datasets/data22_13p6TeV.00432180.physics_Main.deriv.DAOD_PHYS.f1264_m2124_p5334_tid30924306_00/DAOD_PHYS.30924306._*.pool.root.1"
# as per the old api;
simple_xAOD_df = Helpers.MakexAODDataFrame(primary_glob)
# Or with friend trees;
primary_tree = "CollectionTree"
friend_glob = "/home/dayhahen/jetydaod/example_data/rucio/2022_datasets/data22_13p6TeV.00432180.physics_Main.deriv.DAOD_PHYS.f1264_m2124_p5334_tid30924306_00_friends/DAOD_PHYS.30924306._*.pool_friend.root.1"
friend_tree = "triggers"
# Make the df with both of them (could have more than one friend if needed)
friended_df = Helpers.MakexAODDataFrame(primary_glob, primary_tree, [friend_glob], [friend_tree])

Tests

Two tests have been expanded to include vector branches; test/dataFrameFriends_test.cxx and test/dataFrameFriends_test.py. These run in the CI, and will check the behaviour of the new TRVecReader.

Merge request reports