Draft: Augmented file reading with TEvent and EventLoop
Associated JIRA ticket: ATLASG-2539
Related presentations:
- Postponed presentation: https://indico.cern.ch/event/1307783/#3-postponed-reading-friend-tre
- There will be an incoming Core Software meeting presentation
For augmented (DAOD) files reading there are 2 possible use cases:
-
Case (1): the user is only interested by the content of the main tree (i.e. the usual
CollectionTree
) and does not want to retrieve information from the friend trees.
--> In that case all events from the main tree should be processed. -
Case (2): the user wants to retrieve information from one or several friend tree(s) in addition from the information of the main tree.
An example of friend trees isCollectionTree_DAOD_FTAG2
andCollectionTree_DAOD_LLP1
.
--> In that case only the events that are shared between the main tree (CollectionTree
) and the requested friend trees should be processed.
An event that is missing in one of the requested tree is NOT a common event.-
One could be only interested in
CollectionTree_DAOD_FTAG2
additional information.
And thus we need to loop only over events which are common to the 2 trees:CollectionTree
andCollectionTree_DAOD_FTAG2
. -
One could be interested in
CollectionTree_DAOD_FTAG2
andCollectionTree_DAOD_LLP1
additional information.
And thus we need to loop only over events which are common to the 3 trees:CollectionTree
,CollectionTree_DAOD_FTAG2
andCollectionTree_DAOD_LLP1
.
Any event not present in one of those trees is not considered.
-
This MR is mainly changing things for case (2) as previously it was not possible to read information from such friend trees.
For the case (1) despite the classes introduced almost nothing is changing.
To request reading friend trees, the user has to provide a list of friend tree extensions separated by semicolumn e.g.
friendTreesExtensions = "DAOD_LLP1;DAOD_FTAG2"
to read information from the CollectionTree_DAOD_FTAG2
and CollectionTree_DAOD_LLP1
trees.
If that string is empty (default value) then only the main tree will be processed.
I checked that if the user tries to retrieve information from a friend tree while the friend tree has not been requested
then the TEvent
class raises an error and stops by complaining about not being able to find the corresponding branch
Setting of entry lists
The case (2) necessitates to loop over a subset of events from the trees.
Moreover, the common events are not necessarily corresponding to the same entry numbers in the trees.
Let's take an example, the 2nd common event to be processed can correspond to:
- the entry number
10
in the main tree (CollectionTree
) - the entry number
3
in theCollectionTree_DAOD_FTAG2
friend tree - the entry number
14
in theCollectionTree_DAOD_LLP1
friend tree
So we need to have, for each tree, a list of the entry numbers corresponding to common events.
The nice thing is ROOT is allowing to define and set such lists for trees, it is the class TEntryList
.
The entry lists should be computed by us, it is NOT done by ROOT
, e.g. for the main tree the entry list can contain the entry numbers 2,10,15,16,21...
and then
for each tree we can set those lists calling:
tree->SetEntryList(entryList);
After having set them, with the example above, if someone calls
Long64_t entry = tree->GetEntryNumber(entryIndex);
for entryIndex=2
i.e. 2nd common event to be processed
- if
tree "=" CollectionTree
thenentry=10
- if "
tree "=" CollectionTree_DAOD_FTAG2
thenentry=3
- if "
tree "=" CollectionTree_DAOD_LLP1
thenentry=14
On the other hand, for the case (1) where we do not set any entry list, then entry
and entryIndex
are the same numbers
(see TTree::GetEntryNumber()
function).
entry
is called in the MR the entry number while entryIndex
is called the entry index..
The entry index is the index that will be running from 0
to N-1
with N
being the number of events to be processed. Hence for:
-
case (1):
N
= usual number of events in the main tree -
case (2):
N
= number of common events between the main tree and requested friend trees
NB: TTree::GetEntryNumber()
only returns the entry number but does not load any information from the tree.
In the TEvent
and respective manager classes, the information from tree(s) is read separately for each branch (TBranch
class from ROOT
).
That is why in all those classes you will see modifications similar to
Long64_t entry = m_branch->GetTree()->GetEntryNumber(entryIndex);
m_branch->GetEntry(entry);
as m_branch->GetTree()
is returning the pointer to the tree the branch is belonging to i.e. either the main tree or one of the friend tree.
The setting of entry lists + that change allows reading the correct information both for case (1) and (2) by retrieving the correct entry number corresponding to the entry index and then reading the branch content.
Finding the common events
For the case (2), the shared events between the main tree and the requested friend trees should be found.
This matching of events is performed thanks to the introduced TEntryListTTreeHandler
class and TEventList
objects are filled for each trees.
That class takes the main tree pointer and the requested friend tree names list.
The index of trees that was built when creating the augmented (DAOD) files is used.
For the DAOD augmented file the index is built with the index_ref
branch as major variable (and no minor variable).
An event has a unique index assigned.
Hence, if a given index_ref
value is found in the main tree and in all the requested friend trees then it means that event is common to all trees and should be processed.
Thus the corresponding event list for each trees are filled with the respective entry numbers in that case.
The TEntryListTTreeHandler
is implemented in a generic way i.e. we do no explicitly use the index_ref
but use the name of the major and minor variables retrieved from the main tree. After having computed the entry lists are set for the respective trees.
For the case (1) no entry list is computed as all events should be computed.
NB:: for case (2) if any requested friend tree is missing then there is no common event hence the entry lists are set to be empty and the number of events to be processed will be equal to 0. No error is raised (see Metadata check concerning friend trees).
TEvent
class and branch manager classes changes
Due to the processing of a subset of events the entry index (m_entryIndex
) and the entry number (m_entry
) had to be introduced in the TEvent
class and or/branch manager classes.
Most of the time it is the entry index that is used for the reasons explained above as it allows retrieving the correct information in branches by computing the corresponding entry number.
Sometimes in the TEvent
class though m_entry
is needed to retrieve some information from the main tree.
TEvent
class
Tree and chain handlers for the The TEvent
class is either reading information for a TTree
or a TChain
with the TEvent::readFrom()
functions.
In the TEvent
class, to avoid having to deal with the entry lists for the input tree or chain
the tree and chain private member have been replaced by tree and chain handlers
Hence
-
TTree *m_inTree
was replaced bystd::shared_ptr<TEventTTreeHandler> m_inTreeHdr
-
TChain *m_inChain
was replaced bystd::unique_ptr<TEventTChainHandler> m_inChainHdr;
The TEventTTreeHandler
and TEventTChainHandler
also allows defining other functions such as getting the number of events to be processed for case (1) and (2).
Or getting the entry number based on the entry index.
Or also the list of branches, including requested friend tree branches.
TEventTTreeHandler
For the case (2), the entry list is computed and set in that class for a tree. If any requested friend tree is missing then the number of events to be processed will be equal to 0. (see Metadata check concerning friend trees).
NB: For the case (1) no event list is computed nor set
TEventTChainHandler
Before the MR modifications, the TChain implementation was indeed relying on setting the current tree of the chain as the private TTree member of the TEvent class and then use the TTree implementations to read that current tree.
As for the previous implementation the TChain implementation is simply setting the current tree handler of the chain handler in the TEvent class. And then the reading of the current tree handler is used. The event lists are only set for the current processed tree and are not set for the chain.
When initializing the chain managed by the handler, the files without CollectionTree
or with no event to be processed are removed.
Hence the input chain and the chain managed by the handler can differ in terms of number of files (concern only low stat samples).
NB: For the case (1) no event list is computed nor set
Metadata and EventFormatStream information
When creating augmented (DAOD) files, if the main CollectionTree or a friend tree is empty then that tree is not written to output files by Athena. Hence having missing friend trees should not be taken as an issue in the sense no error should be raised.
However we need to make sure that the user is not requested friend trees that never existed in the first place. This is done thanks to the MetaData trees which are always saved to the augmented (DAOD) files e.g. even if CollectionTree_DAOD_FTAG2
is missing the MetaDataTree_DAOD_FTAG2
should be found in the file.
Otherwise it means that the FTAG2
information was not requested to be added.
Hence an error is raised in the TEvent class if such MetaData tree is not found.
Concerning the Metadata trees the EventFormatStream information needs to be retrieved for each objects hence the MetaData trees associated to the requested friend trees are also read. The EventFormatStream info was added to the augmented (DAOD) files thanks to the following MR !64183 (merged) by @maszyman.
The "issue" was more discussed in the JIRA ticket ATLASG-2539
Getting branches from friend trees using the main tree
When doing
TObjArray* branches = tree->GetListOfBranches();
TObject* brObject = branches->FindObject( branchName.c_str() );
TBranch* br = static_cast< TBranch* >( brObject );
the list of branches returned is only for the tree and it does not provide the list of friend tree branches.
Instead to be able to get as well friend tree branches one can do
TBranch* br = tree->GetBranch( branchName.c_str() );
Which will return a null pointer if the branch does not exist. Hence that change in some parts of the code.