We need a relations table between v2 particles and the related vertices whether this is the related primary vertex or the particle's own related vertex. Aside from the name of the table I'm not sure I see why the implementation of those things needs to be different.
Requiring the same number of children seems unhelpfully rigid; there are many cases I can think of in B2OpenCharm for example where you take various D decays, add a track, and make a B. Forcing those D decays to always have to be to the same number of final state tracks seems like it would impose a structure on how lines are written which I'm not sure we want. But I may be missing the point here.
It was also pointed out that having a fixed number of children is too much a limitation, for the reasons that @gligorov mentioned above. However, we will probably be merging only a limited number of containers, and the design could possibly use this somehow for the optimisation?
The original idea was to use relations tables to accomplish this, but this is probably not a good idea, since the child-parent relations need to be constructed by a combiner algorithm, not at a later stage. Thus it is logical to have the links tied to the composite object.
Apologies for not including everyone relevant. I created this issue mostly for myself to keep track of things that came to my mind while writing a packer for v2 particles. But feel free to use it for a general discussion.
Just to be a bit more explicit - @amathad's comment related to what he needs to start supporting Particle v2 in FunTuple is "... currently in FunTuple we select decay branches using LoKi decay finder syntax and associate ThOr functor to that branch (see the options file here and using either Decays::IDecay::Finder or Decays::IMCDecay::Finder, we find them like here). I am looking for similar sort of interface for v2::Composites.".
Trying to summarize from what I remember from last Monday's (Oct 4, 2021) WP2/3 meeting:
Given that composites are "made" out of the daughters, they should contain the structure that allows to navigate them (in contrast to relying on an external table).
A conceptually simple approach is:
Each particle i can have n_i daughters in n_i different containers.
These containers can be navigated by storing for particle i: n_i indices in the different containers, and n_i different pointers to these containers.
In practice I assume one can limit n_i to a (reasonable) number, e.g. I doubt one ever needs more than 8-body combinations.
Possibly one does not need n_i different pointers, as one will never create a composite out of n_i different containers (?). But maybe this does not matter too much.
Persistency: How is this persisted? The pointers will be lost, so one has to "re-attach" them after unpacking? Something similar is already done now, can this infrastructure be copied?
Currently (IIRC) the daughters of a particle are accessed of the the products of an end-vertex. I assume this could be done in the same way by either zipping a vertex object to the particle, or having it store the vertex in its own class.
And to summarize (my biased view of) the list-of-questions-discussion-items from todays meeting:
can we ‘disentangle’ SOA-ness and SIMD register usage — specifically, does the zipping machinery work without SIMD types? (@decianm, @mramospe, @agilman, @nnolte)
ThOr functors define the API — i.e. the assumption is that most interaction with Particle will be through ThOr functors. Corollary: ThOr functors have to be able to deal with non-SIMD-but-still-SOA type objects, with ‘members’ which are non-arithmetic types. Does that work with the proxies? (@decianm, @mramospe, @agilman, @nnolte) (also: #104)
How to persist references to other containers (transient: pointer to container as input container is known to be on TES, or location). This is the same problem that needs to be solved for Relations — so can the Relations code act as a ‘canary’ for this purpose (@mveghel) This defines what meta-information must be available when.
meta-information: in principle everything is known at configure time of the writing side. How to distill the required information (see previous point!) and then make it available to the reading side? need a way to persist it, and query it. What queries are required? How to integrate it with the reading?How do we implement this system? TCK? Run header/ FSR-like ? (@rosen, @sponce, @clemenci)
references to children: any container is made from a (at runtime) known number of other containers. Store a table with entries for each external container in the particle-container, identifying them sufficiently (RegEntry const*? string with TES location? — again whatever is needed for (3)). Container also contains (by value) a list of indices (which are pairs of ‘which container’ + offset in container -- the first 'which container' is similar to what the link manager in DataObject does), partitioned for each entry (i.e. each composite entry has (‘first child offset, N children) entry).
'particle' is defined by the ThOr functors operating on it -- i.e. the ThOr functors are the main (and stable) API and constrain the implementation, ie. if the implementation can satisfy ThOr, then it is 'done' ;-)
Hope I didn't forget something, or forgot to tag the relevant persons... feel free to tag more people if I did... Also, if there are additional points, or I've misrepresented any point, don't hesitate to mention that.
no, I really mean SOA without explicit SIMD types. And that also includes SOAOS. To give an example: let's say I have some geometry problem which can be split into 2+1 dimensions, and I want to store my points as: ((X1,Y1),(X2,Y2),(X3,Y3),..... ), (Z1,Z2,Z3,....), because I know that I always use (X,Y) together, and only occasionally use Z. So while my (X,Y) data maps to a SIMD type, and can be trivially copied into them, they're not necessarily explicit SIMD types. Basically, I don't want to be constrained in the data structure by only being allowed to use the types defined in SIMDWrapper.h.
Or, to put it even more explicitly (thanks to @chasse), I would like to be able to have struct Point2D { float x; float y } and then define my data layout as ((Point2D,Point2D,Point2D,...),(float, float,float,...)).
ThOr functors define the API — i.e. the assumption is that most interaction with Particle will be through ThOr functors. Corollary: ThOr functors have to be able to deal with non-SIMD-but-still-SOA type objects, with ‘members’ which are non-arithmetic types. Does that work with the proxies? (@decianm, @mramospe, @agilman, @nnolte) (also: #104)
Just to add my 2 cents: we will continue to use particle refitting frequently. So I'm not so sure I agree that ThOr is the defining API here: we will have to be able to navigate the decay tree easily also in our C++ algorithms.
Well, there will be some ThOr functor that will have to do navigation internally, so this is something that will be possible. So perhaps it is good to be more precise in the definition of 'functors define interface' as I have actually some quite specific in mind, and that is the importance of the role of ADL to customize behavior -- basically, see the arguments in Rec#223. So I would hope that at the 'bottom' layer ThOr functors actually turn into 'very thin wrappers' calling plain functions instead of complicated if-constexpr cascades (and then build on top of that there are more complicated things such as binding, i.e. functors that take functors and create other types of functors with a reduced list of arguments). And this implies that such functions are then also usable elsewhere, and not just inside ThOr. So I don't think that you have too much to worry about.
But to do make the original point: I would expect the navigation to be generalized, and preferentially expressed as functions: eg. given this composite, given me all the final state particles which satisfy some set of criteria (such as their mother is a D0, their PID is a Kaon, their momentum is larger than X, and their probNNK is large). And that also includes: give me the D0 in this B0->D*-pi+ decay. And give me all children of the D0 in this B0->D*-pi+ decay. So navigation can be phrased in terms of functions, and we should make it easy to use those functions (i.e. any complicated indexing is done behind the scenes, as I very much only want to write and debug that once). But you will have to start thinking more along the lines of how RDataFrame works than of 'I explicitly loop over something and then interrogate every leaf object I am given explicitly, and then continue walking the decay tree' (even if I expect that yes, that will still be possible), i.e. stop thinking about loops, and instead think about the information you want to extract from sets of data.
aside: any kinematic fitter, and DecayTreeFitter explicitly, already copies the input into its own data structure specific to the fitter. So that implies that the 'front end' of such a fitter would have to be adapted to extract the information it needs, but the 'guts' (or better 'brains') of the fitter remain the same. Hence I'm not too worried about the amount of work required to adapt DecayTreeFitter to any new particle class that can represent a decay chain.
Or, to put it even more explicitly (thanks to @chasse), I would like to be able to have struct Point2D { float x; float y } and then define my data layout as ((Point2D,Point2D,Point2D,...),(float, float,float,...)).
So far I never used the SOACollection like this. The way it is currently implemented it would store x1,x2,...y1,y2,...,z1,z2,... I think. This could be changed I assume, as it is just fiddling with offsets, but maybe would need some "offset layout", depending on the class you actually want to store (?)
Is it actually worth implementing this for what we need right now, except from a conceptional point of view (I don't argue that conceptionally it is better)?
To make @graven point about copying input into its own data structures more explicit and general, it is important to stress that this is not just a coding design issue but also a division-of-labour issue. The plan is that the framework provides a single Particle class with ThOr functors as the primary supported API. Here supported means that it is the job of the core RTA and DPA teams to maintain and expand this API as required by analysts.
Now if there are use-cases not covered by this API, for example refitting (whether vertices or particles), there are two ways we can go. Either the ThOr API is expanded to also support this use-case, or the developers of that use-case perform a conversion into their preferred data structures inside their own algorithms. In the second case they are responsible for the performance impact of this conversion, the testing of this conversion, and its long-term maintenance. It is in particular explicitly not the job of the core developer team to maintain such converters, because it places an unsustainable long-term burden on these core developers. Now the best way should be discussed on a case-by-case basis balancing scalability, maintainability, and developer workload but the message is that if you can reimagine your use-case to be able to access the information through the preferred API you save work in the long term; where this is genuinely not possible the core team will of course help but long-term maintenance will fall on your shoulders.