provide fixed schemas for CP Algorithm n-tuples, including for empty files (!84868) · Merge requests · atlas / athena

The primary goal of this MR is to ensure that all n-tuples produced by the CP Algorithms have exactly the same list of branches (for the same configuration). That is useful for RDataFrame-based frameworks which can stumble if variables needed are not present in all input files. That means that the branches still need to be created even if no events were processed, which in turn means that the existing logic of scanning the event store on the first accepted event can no longer be used. And since it is not guaranteed that execute will ever be called I moved the tree setup to initialize.

The main challenge here is to know the types of all decorations upfront during initialize. That is known upfront if they are used with any of the following:

columnar accessors
systematics decoration handles
SG::ConstAccessor or SG::Decorator dynamically created in initialize
static SG::ConstAccessor or SG::Decorator at file level

It will not happen automatically for the following:

SG::ReadDecorHandleKey or SG::WriteDecorHandleKey
static SG::ConstAccessor or SG::Decorator at place of use
decorations read from the input file (in Athena/StoreGate and for empty input files)

There are a couple of workarounds:

move local SG::ConstAccessor or SG::Decorator to the file level or make them (dynamically allocated) class members
create an accessor temporary in initialize, e.g. SG::ConstAcessor<float> {"myDeco"};
specify the type when registering the output config.addOutput(self.containerName,'myDeco','myDeco',type='float')
specify the type when adding an output rule myContainer.myDeco -> myDeco type=float

There was some discussion in MatterMost whether the last two workarounds should be used, or should even exist, but particularly for variables coming from the input file or upstream algorithms that seems like a fairly straightforward and robust solution. I did go through and fixed all issues I encountered in the tests I ran.

A second problem is that there is no way of knowing which of the containers are "proper" DataVector containers, and which are simply an SG::AuxElement. The list of all those special "containers" has to be passed in. However, in practice it is only EventInfo and I set the default of the property and option to that. Should more of them show up in testing we can probably just add them to the default list.

I also got rid of AsgxAODMetNTupleMakerAlg and merged all functionality into the AsgxAODNTupleMakerAlg. To maintain the existing functionality there is now an added option metTermName=Final for the variable rule to write out a single term instead of the entire MET container. The existing MET options are maintained and get translated accordingly.

In order to facilitate this, I first did a major restructuring of the code in AsgxAODNTupleMakerAlg and AsgxAODMetNTupleMakerAlg first. I moved all the private member classes and related code into a separate file and namespace TreeBranchHelpers. That allowed to eliminate the duplicate code and merge related code, which eventually allowed to get rid of AsgxAODMetNTupleMakerAlg completely.

I did change a little bit how objects are held, changing from std::list<...> to std::vector<std::unique_ptr<...>>. That allowed me to introduce an IObjectProcessor interface class, that has three separate implementations. I also thought of making interfaces for ElementBranchProcessor and ContainerBranchProcessor, but they can't have a common interface and only have a single implementation each. However, there could still be multiple implementations in the future, in which case I'd introduce abstract interfaces.

I did introduce a BranchConfig structure that contains all the information for a single variable rule, and a OutputBranchData structure that contains all the information for a single output branch. That makes it easier to pass information around, to break the processing up in stages, and to cache information between stages as needed.

I broke the processing up into multiple stages:

parse all the rules into BranchConfig objects
determine the aux-data types for each BranchConfig object. if types are missing, print out a list of all missing decorations. special care is taken that you indeed get a complete list, not just one missing decoration at a time. this saves a lot of time when fixing issues.
make OutputBranchData objects for all rules and systematics, dropping duplicate branches and also flagging branch name collisions
reorder the branches to output one systematic at a time
connect all the branches to the tree

We had some discussion on how best to pass extra parameters into branch rules, but in the end I went with what was easiest and just kept extending the regular expression. The main issue will be that there is a fixed order of extra parameters now. A minor issue is that this format is fairly custom, which could create problems when merging. It could be replaced with a dictionary, but at this point I'd rather leave that to a separate MR.

I still create several algorithm instances for writing the output. At this point that could potentially be consolidated into a single algorithm instance again, but again I'd rather leave that to a separate MR.

I left existing helper functions as is, though maybe they should be moved to the same namespace in a future MR.

Edited Dec 11, 2025 by Nils Erik Krumnack

provide fixed schemas for CP Algorithm n-tuples, including for empty files

Merge request reports