Consolidate track types

Ideally, we only use 2 types for tracks

Agree!

It would be cool if SOATracks were flexible-yet-specialised enough to replace PrTracks as well, but I guess that's not the case (?).

At the moment v1::Track is still needed for the full Moore sequence (?).

Yes, these are needed for building LHCb::ProtoParticle objects which then feed into LHCb::Particle.

To start with, can we get rid of v2::Track?

Sounds good to me. Where are they used?

To start with, can we get rid of v2::Track?

Sounds good to me. Where are they used?

I think almost nowhere by now. One probably needs to adapt the converters for MC matching to go from PrTrack to v1::Track directly. And maybe remove some unused template specialisations.

It might be possible to replace the PrTracks with SOATracks eventually, but given that the PrTracks are "invisible" to most users, and they perform well, this would be the last step for me.

Indeed, I think there are conversions which use as an intermediate step v2::Track and it is included in quite many files, so it will need some cleaning and testing.

There is also the RecVertex_v2 which is based on the v2::Track, and that is used quite a bit in Rec, isn't it?

That does not really matter since the relations between Tracks and PVs are anyhow broken if you use a PV finder based on PrTracks (https://gitlab.cern.ch/lhcb/Rec/-/blob/master/Tr/PatPV/src/TrackBeamLineVertexFinderSoA.cpp#L781).

The plan (at least mine ) is to adapt RecVertex_v2 to work with indices, and make it the default RecVertex to be used for the Particles which are used for the selections.

I think the namespaces are confusing though, as v2::Track and v2::Tracks are not the same thing.

the v2::Track vs v2::Tracks sounds like a bug -- and why is SOATrack not just v3::Track?

clearly SOATrack (aka v2::Tracks) does not fit in the namespace scheme as intended. I propose to make a MR to just make sure these will be known as LHCb::Event::v3::Tracks (and not LHCb::v2::Event::Tracks) and to change the header to Track_v3.h instead of Track_SOA.h (and fix the subsequent fall-out...)

I must admit that I am a bit stuck now that v2 tracks no longer exists: I don't have any alg anymore to check the PV finder. I understand that it was broken anyway, but without the container being created I also cannot fix the PV->track association.

My conclusion is that we shouldn't use either v1 or v2 RecVertex for PV. They just deserve their own class, as it was proposed long ago. I can easily implement something with indices in the 'velo' PrTracks container. However, how do we then make the link to tracks in the 'forward' container? Should we just go back to comparing LHCbIDs? (I hope not.)

I am not sure this is the correct place for this discussion, but will LHCb::Particle remain or not? I understand that v2:Particles is used in the combiners etc, but it doesn't seem to be something that non-expert physicists could use yet. Do we plan to have an old-style interface to particles and tracks available as well? (LHCb::Particle is still all over the LHCb code.) And if so, will that then use v1::Track?

I am now migrating PV monitoring code from v2::Track and v2::RecVertex to v1 track and vertex. Somehow this seems pointless: v2 clearly contained improvements relative to v1.

I don't think that we can do without a v1/v2 model in the user physics code. So, it makes more sense to move forward with the v2 model, or fix the v1 model. I assume that this has been extensively discussed elsewhere. Can somebody perhaps tell me what the plan is?

I am not sure this is the correct place for this discussion, but will LHCb::Particle remain or not? I understand that v2:Particles is used in the combiners etc, but it doesn't seem to be something that non-expert physicists could use yet. Do we plan to have an old-style interface to particles and tracks available as well? (LHCb::Particle is still all over the LHCb code.) And if so, will that then use v1::Track?

These are important matters. From the DPA side we plan to totally retire LHCb::Particle and move to v2, as stated in https://gitlab.cern.ch/lhcb-dpa/project/-/issues/60. The corrolary is that we won't work (can't even afford in terms of people committed) on interfaces et al. unless some matter of urgency arises super close to data-taking and we get into crisis mode - clearly to be avoided. (Am avoiding too many comments on purpose.)

My worry is that the Particle_v2 isn't something that 'ordinary' users can actually ever use. A non-soa model where we can follow pointers to daughters, tracks etc, is already hard enough. Unless we give up on students actually developing non-trivial reconstruction and analysis code, I'd be much in favour of keeping a non-soa model available through converters.

It will also be highly non-trivial to translate code like DecayTreeFitter to the SoA framework. You'd need to start from scratch. Do we have the person-power for that?

We're now having this discussion in two different place but let me back up what @erodrigu said.

In general I think it does no favour to students or postdocs to pretend that they can "develop non-trivial reconstruction and analysis code" without understanding parallel programming techniques. The architectures are only becoming more parallel, and our budget constraints more severe.Of course for more basic cases the aim of both RTA and DPA is to provide a front end so that people working on pure physics analysis can still make their ntuples using standard tools and work from them as a starting point.

as follow up to this and this: see !3183 (merged)

In general I think it does no favour to students or postdocs to pretend that they can "develop non-trivial reconstruction and analysis code" without understanding parallel programming techniques. The architectures are only becoming more parallel, and our budget constraints more severe.

This is so true! If anything I would argue that the situation is actually harder for older people such as me who started with FORTRAN, then wrote C++98 for run-1 and now have to learn a largely different C++ language and adapt to heterogeneous computing. The younger people are already in the right mood/environment in the same way that my daughter thought 3-ish years ago that all screens are touch-sensitive - felt natural to her . Writing reconstruction code has never been trivial and I don't think it will ever be. I would not pretend it is and am convinced that it would be of no benefit to younger colleagues to (effectively) have "wrappers" or alike to make things look as more user-friendly. I appreciate the frustration, of course, whereby the code is evolving on various fronts in parallel and at times things break or get much harder. Myself I then try and stay focused on the end-goal and grand picture because this is all a means to a goal, right? This is all bla-bla and certainly does not solve your present-day issue, but I do hope we can get the situation better for all at the earliest convenience.

Of course for more basic cases the aim of both RTA and DPA is to provide a front end so that people working on pure physics analysis can still make their ntuples using standard tools and work from them as a starting point.

Yep, totally.

It will also be highly non-trivial to translate code like DecayTreeFitter to the SoA framework. You'd need to start from scratch. Do we have the person-power for that?

I'm also totally convinced on this one, and worried. @pkoppenb already started some work on this front, see https://gitlab.cern.ch/lhcb-dpa/project/-/issues/117, but that's only part of the problem, rather related to the "API", as it were. If you can provide help and/or guidance that would be huge.

I have some sympathy for all of this, but we should also not think that 'simd SoA' data structures are always the right thing. As soon as you make sparse selections, you'll want to rearrange the data. I am not an expert but I have the impression that making selections on simd data structures is not cheap. So, we can spend an enormous amount of time to adapt analysis tools such that we can apply them to simd::size candidates at the same time, but I truly wonder how much this helps in practice for low multiplicity analysis level objects that we are dealing with at the end of a selection sequence.

Or to put it differently, I am a little hesitant to co-develop a version of DTF that is vectorized such that you can fit N candidates simultaneously. In practice you are not going to apply it to a large number of candidates with identical topology. What seems simpler is to vectorize the single-particle DTF such that it internally treats the N-tracks in the fit simultaneously. That is a very different kind of vectorization that is easier to implement and (as far as I can see now) has little to do with the SoA event model.

I am going to give you another example. We spent a lot of time to make the PV finder 'SoA'. The algorithm is very suitable for it, since its input is an enormous amount of identical tracks. However, in the end we barely gained a factor of two. I wonder if part of the problem is not exactly the SoA structures themselves: The vectorized computations are extremely fast, but since they need data from different columns in the SoA structure, the memory access is actually very 'sparse'. (But I need to be careful what I write here, because I actually don't understand the internals of our SOACollection yet:-) )

Anyway, I know that I am the odd one out here, so I'll rest my case.

You make good points and you will never hear me say that a solution fits all in general. I'm also not an expert on all these matters so will let the experts comment further ... In the end I only wanted to let you know what DPA is moving towards as a skin, acknowledging the fact that offline does not need to do the heavy lifting with data structures.

The issue is maintenance. Multiple event models and converters add failure modes and a maintenance burden. And they add it exactly where it is most dangerous: we know that in practice a very large fraction of the bugs which affect analysts are to do with data structures being written to a location other than what the downstream algorithms expected, or in a format other than what they expected.

So one has to factorize things. For anything that happens post-ntuple-making, where maintenance is the responsibility of physics analysts, it's up to them how things are done. That's where flexibility can be and is maintained. But in the real-time processing framework itself things have to be coherent in order to be maintainable.

If it is really not possible to develop a simd DTF, you always have the option to transform the data inside the DTF algorithm itself; in this case you only need to learn to read SOA structures, which would have to be the case in a converter layer anywhere. Then it's the responsibility of whoever maintains a given algorithm (you in this case of the DTF, but my point is more general) to make sure that transformation does not induce bugs, rather than this work being put back onto the core team in the form of converters.

added Event model cleanup labels

mentioned in merge request Rec!2470 (merged)

changed the description

mentioned in merge request Rec!2481 (merged)

changed the description

mentioned in merge request Moore!979 (merged)

Consolidate track types

Child items ...

Activity

Admin message

Consolidate track types

Activity