Improve overlap checking for ThOr::Combiner<T>

Original idea

Currently the overlap checking in ThOr::Combiner<T> is implemented in a very simple way:

template <typename T, typename... Ts>
auto have_overlap( T const& p0, Ts const&... ps ) {
  return ( ( p0.pt() == ps.pt() ) && ... ) && ( ( p0.p() == ps.p() ) && ... );
}

This "works" when we only consider composites formed from clone-killed tracks that are not refitted (in that case collisions are very unlikely), but it is fragile and will not give the correct results for more complex decay trees. For example, if the decay tree is D^0 \rightarrow (K_{\text{S}}^0 \rightarrow \pi^+\pi^-)\pi^+\pi^- then this check would allow the same charged pion to be used as both a child and grandchild of the D^0. Carefully chosen selection cuts might exclude this, but that is an unacceptably fragile solution.

A proposal for an improved, but still performant, scheme was included in a code comment and discussed in slide 7 of this talk. Namely it is proposed that a new integer identifier is introduced that reflects whether or not two physics objects should be considered "the same" from the point of view of the selection algorithms. For charged basic particles, it would be natural to allocate these identifiers after the track fit and clone killing (so the identifiers would \sim indices into Rec/Track/Best, to use Run 1/2 language). Composite particles would have to store multiple identifiers; the proposal would be to store these "by value" in the composite, rather than requiring overlap checking code to traverse the decay tree.

Taking the example of D^0 \rightarrow (K_{\text{S}}^0 \rightarrow \pi^+\pi^-)\pi^+\pi^-, the container of D^0 composites would contain:

4 columns of these unique identifiers for the four basic physics objects contained in the tree
Child relations for the 3 children of the D^0; one child relation consists of two columns, a "zip family ID", and an index valid within that zip family.

It should be considered whether it is useful/feasible to organise these columns such that it is known which of the 4 identifiers correspond to the 3 children, or whether it is more useful to have the unique identifiers sorted for efficient set comparisons.

Then the combiner algorithm can be modified to access the unique identifiers and require there is no overlap between the identifiers of the particles it has been asked to combine.

It should also be considered how bremsstrahlung photons should be incorporated in this scheme. @olupton had assumed that some bremsstrahlung recovery algorithm will associate bremsstrahlung photons to charged particles at some point late in the reconstruction sequence. It seems likely that the simplest solution is to ignore bremsstrahlung photons as part of the overlap checking (we would not want to reject an e^+e^- combination simply because the same photon was added to both electrons), but without special treatment this could lead to double counting of bremsstrahlung photons in the momentum of the parent particle. In Run1-2 there was a dedicated algorithm, DiElectronMaker, that ensured photons were only counted once in this case.

Current implementation

Unique IDs are provided for each event by instances of a class UniqueIDGenerator located in the TES, different for each event, which store an internal atomic mutable counter and a unique identifier (tag). On request, this instance provides an ID, which is composed by the value of the internal UniqueIDGenerator counter and its tag. The internal counter is incremented by one after this operation, which ensures that all the IDs are unique within an event.

Due to the current implementation of the SOA collections, repeatedly storing the UniqueIDGenerator tag for each entry of the collections becomes inefficient and memory consuming. The tag is stored as a member of the collection instead of the contained objects, which only store the counter that was provided by the UniqueIDGenerator. When accessing the values of the IDs, these are built again from the tag and the counter, ensuring that comparison between IDs is done correctly.

When attempting to compare two IDs from different generators, a std::runtime_error is thrown. This case should be very rare, since in principle we only need one instance of UniqueIDGenerator per event.

Edited Aug 04, 2021 by Miguel Ramos Pernas