Skip to content

Draft: Augmented file reading with TEvent and EventLoop

Associated JIRA ticket: ATLASG-2539

Related presentations:

For augmented (DAOD) files reading there are 2 possible use cases:

  • Case (1): the user is only interested by the content of the main tree (i.e. the usual CollectionTree) and does not want to retrieve information from the friend trees.
    --> In that case all events from the main tree should be processed.

  • Case (2): the user wants to retrieve information from one or several friend tree(s) in addition from the information of the main tree.
    An example of friend trees is CollectionTree_DAOD_FTAG2 and CollectionTree_DAOD_LLP1.
    --> In that case only the events that are shared between the main tree (CollectionTree) and the requested friend trees should be processed.
    An event that is missing in one of the requested tree is NOT a common event.

    • One could be only interested in CollectionTree_DAOD_FTAG2 additional information.
      And thus we need to loop only over events which are common to the 2 trees: CollectionTree and CollectionTree_DAOD_FTAG2.

    • One could be interested in CollectionTree_DAOD_FTAG2 and CollectionTree_DAOD_LLP1 additional information.
      And thus we need to loop only over events which are common to the 3 trees: CollectionTree, CollectionTree_DAOD_FTAG2 and CollectionTree_DAOD_LLP1.
      Any event not present in one of those trees is not considered.

This MR is mainly changing things for case (2) as previously it was not possible to read information from such friend trees.
For the case (1) despite the classes introduced almost nothing is changing.

To request reading friend trees, the user has to provide a list of friend tree extensions separated by semicolumn e.g. friendTreesExtensions = "DAOD_LLP1;DAOD_FTAG2" to read information from the CollectionTree_DAOD_FTAG2 and CollectionTree_DAOD_LLP1 trees. If that string is empty (default value) then only the main tree will be processed.

I checked that if the user tries to retrieve information from a friend tree while the friend tree has not been requested then the TEvent class raises an error and stops by complaining about not being able to find the corresponding branch

Setting of entry lists

The case (2) necessitates to loop over a subset of events from the trees.
Moreover, the common events are not necessarily corresponding to the same entry numbers in the trees.

Let's take an example, the 2nd common event to be processed can correspond to:

  • the entry number 10 in the main tree (CollectionTree)
  • the entry number 3 in the CollectionTree_DAOD_FTAG2 friend tree
  • the entry number 14 in the CollectionTree_DAOD_LLP1 friend tree

So we need to have, for each tree, a list of the entry numbers corresponding to common events.
The nice thing is ROOT is allowing to define and set such lists for trees, it is the class TEntryList.

The entry lists should be computed by us, it is NOT done by ROOT, e.g. for the main tree the entry list can contain the entry numbers 2,10,15,16,21... and then for each tree we can set those lists calling:

tree->SetEntryList(entryList);

After having set them, with the example above, if someone calls

Long64_t entry = tree->GetEntryNumber(entryIndex);

for entryIndex=2 i.e. 2nd common event to be processed

  • if tree "=" CollectionTree then entry=10
  • if "tree "=" CollectionTree_DAOD_FTAG2 then entry=3
  • if "tree "=" CollectionTree_DAOD_LLP1 then entry=14

On the other hand, for the case (1) where we do not set any entry list, then entry and entryIndex are the same numbers
(see TTree::GetEntryNumber() function).

entry is called in the MR the entry number while entryIndex is called the entry index..
The entry index is the index that will be running from 0 to N-1 with N being the number of events to be processed. Hence for:

  • case (1): N = usual number of events in the main tree
  • case (2): N = number of common events between the main tree and requested friend trees

NB: TTree::GetEntryNumber() only returns the entry number but does not load any information from the tree.

In the TEvent and respective manager classes, the information from tree(s) is read separately for each branch (TBranch class from ROOT). That is why in all those classes you will see modifications similar to

Long64_t entry = m_branch->GetTree()->GetEntryNumber(entryIndex);
m_branch->GetEntry(entry);

as m_branch->GetTree() is returning the pointer to the tree the branch is belonging to i.e. either the main tree or one of the friend tree.

The setting of entry lists + that change allows reading the correct information both for case (1) and (2) by retrieving the correct entry number corresponding to the entry index and then reading the branch content.

Finding the common events

For the case (2), the shared events between the main tree and the requested friend trees should be found.

This matching of events is performed thanks to the introduced TEntryListTTreeHandler class and TEventList objects are filled for each trees.

That class takes the main tree pointer and the requested friend tree names list.
The index of trees that was built when creating the augmented (DAOD) files is used.
For the DAOD augmented file the index is built with the index_ref branch as major variable (and no minor variable). An event has a unique index assigned. Hence, if a given index_ref value is found in the main tree and in all the requested friend trees then it means that event is common to all trees and should be processed.

Thus the corresponding event list for each trees are filled with the respective entry numbers in that case.

The TEntryListTTreeHandler is implemented in a generic way i.e. we do no explicitly use the index_ref but use the name of the major and minor variables retrieved from the main tree. After having computed the entry lists are set for the respective trees.

For the case (1) no entry list is computed as all events should be computed.

NB:: for case (2) if any requested friend tree is missing then there is no common event hence the entry lists are set to be empty and the number of events to be processed will be equal to 0. No error is raised (see Metadata check concerning friend trees).

TEvent class and branch manager classes changes

Due to the processing of a subset of events the entry index (m_entryIndex) and the entry number (m_entry) had to be introduced in the TEvent class and or/branch manager classes. Most of the time it is the entry index that is used for the reasons explained above as it allows retrieving the correct information in branches by computing the corresponding entry number. Sometimes in the TEvent class though m_entry is needed to retrieve some information from the main tree.

Tree and chain handlers for the TEvent class

The TEvent class is either reading information for a TTree or a TChain with the TEvent::readFrom() functions. In the TEvent class, to avoid having to deal with the entry lists for the input tree or chain
the tree and chain private member have been replaced by tree and chain handlers Hence

  • TTree *m_inTree was replaced by std::shared_ptr<TEventTTreeHandler> m_inTreeHdr
  • TChain *m_inChain was replaced by std::unique_ptr<TEventTChainHandler> m_inChainHdr;

The TEventTTreeHandler and TEventTChainHandler also allows defining other functions such as getting the number of events to be processed for case (1) and (2). Or getting the entry number based on the entry index. Or also the list of branches, including requested friend tree branches.

TEventTTreeHandler

For the case (2), the entry list is computed and set in that class for a tree. If any requested friend tree is missing then the number of events to be processed will be equal to 0. (see Metadata check concerning friend trees).

NB: For the case (1) no event list is computed nor set

TEventTChainHandler

Before the MR modifications, the TChain implementation was indeed relying on setting the current tree of the chain as the private TTree member of the TEvent class and then use the TTree implementations to read that current tree.

As for the previous implementation the TChain implementation is simply setting the current tree handler of the chain handler in the TEvent class. And then the reading of the current tree handler is used. The event lists are only set for the current processed tree and are not set for the chain.

When initializing the chain managed by the handler, the files without CollectionTree or with no event to be processed are removed. Hence the input chain and the chain managed by the handler can differ in terms of number of files (concern only low stat samples).

NB: For the case (1) no event list is computed nor set

Metadata and EventFormatStream information

When creating augmented (DAOD) files, if the main CollectionTree or a friend tree is empty then that tree is not written to output files by Athena. Hence having missing friend trees should not be taken as an issue in the sense no error should be raised.

However we need to make sure that the user is not requested friend trees that never existed in the first place. This is done thanks to the MetaData trees which are always saved to the augmented (DAOD) files e.g. even if CollectionTree_DAOD_FTAG2 is missing the MetaDataTree_DAOD_FTAG2 should be found in the file. Otherwise it means that the FTAG2 information was not requested to be added. Hence an error is raised in the TEvent class if such MetaData tree is not found.

Concerning the Metadata trees the EventFormatStream information needs to be retrieved for each objects hence the MetaData trees associated to the requested friend trees are also read. The EventFormatStream info was added to the augmented (DAOD) files thanks to the following MR !64183 (merged) by @maszyman.

The "issue" was more discussed in the JIRA ticket ATLASG-2539

Getting branches from friend trees using the main tree

When doing

TObjArray* branches = tree->GetListOfBranches();
TObject* brObject = branches->FindObject( branchName.c_str() );
TBranch* br = static_cast< TBranch* >( brObject );

the list of branches returned is only for the tree and it does not provide the list of friend tree branches.

Instead to be able to get as well friend tree branches one can do

TBranch* br = tree->GetBranch( branchName.c_str() );

Which will return a null pointer if the branch does not exist. Hence that change in some parts of the code.

Edited by Romain Bouquet

Merge request reports