As Wouter presented today there are benefits for storing the track-to-PV information computed in the PV finder for later use:
Cheap PV unbiasing.
Quick look-up during selections, i.e. finding the 'best PV'.
Useful for the persistence, allowing downstream applications to re-use the relations computed in HLT2.
In addition to the PVs themselves, the finder should also store this per-track information:
Index to vertex.
Weight.
Impact parameter and
χIP2
.
Derivatives of
χ2
with respect to PV position.
There are some technical aspects to consider:
The PV finder acts on VELO tracks. How do we propagate the track-to-PV information through later stages of the track reconstruction in a way that, for example, lets us access best-PV information on long tracks? What about track types which do not have a VELO segment?
What is the most appropriate format for this information? Should a track class include it directly (we could force it to be a prerequisite for building long tracks), or should it be a (zipped?) extension to existing track classes?
How do we persist the information for downstream use?
Should we try to adapt legacy selection algorithms to use this P2PV information?
How can we integrate this information into the functors?
In principle this should be straightforward for the ThOr functors. We already explicitly pass in PVs, it would be a simple extension in the configuration and C++ to pass the relations as well. It will make simultaneously supporting the legacy selection framework within ThOr harder though.
As referenced in #103, the ThOr selection framework does already have the concept of a 'best vertex relation'. It's not immediately obvious to me how this fits in the above but we should investigate it before re-inventing the wheel.
/cc @agilman@mramospe worth considering this aspect when thinking about tracks/particles/selections
One question about best vertex relations -- the class BestVertexRelations serves a double purpose: it both keeps the information about 'what is the index of the best vertex' (and also what are the ip chisq), and, in its constructor, computes the ip chisq and determines the indices. Why not move the code from the constructor into the function which calls it, and just have it keep the indices (and perhaps the ip chisq -- but that might be pushing it already). In that case, the data structure could be re-used for other methods.
So to be concrete: this class 'mixes' data and how to compute the data -- perhaps it would be better (in this case) to separate the two, and make the data 'more dumb' so it can be re-used.
Because this issue discusses the PV finder being the one to compute IPCHI2 then yes I agree, BestVertexRelations should be nothing more than a container.
Presumably though the output structure of the PV finder will be little richer than BestVertexRelations, so the former will superseed the latter (maybe? I guess BestVertexRelations has so SIMD-compatible logic?).
For now, we could create a new data structure that contains just the fields above. The output the vertex algorithm would then be a new 'vector' of these data, with the exact same length as the input list. I can do that, but probably only efficiently (my time) for the 'non-soa' version. I'd then need some help for the parallelized version.
There is one thing to think about, which I didn't mention yet :-) : To make sure we compute the track parameters at the right z-position, we'd need one 'update' of the track parameters in the PV finding alg. This will make it a little slower, perhaps as much as 20%. I hope that that is acceptable.
We indeed need to keep track of long->velo relations. Can we assign to every long track an 'ancestor' index?
For track types that do not have a velo segment, we could do something 'afterwards'. However, please realize that the IP-chi2 of such tracks does not really have the power to assign the right PV (or separate prompt from displaced, etc). So, I argue that for those tracks IP-chi2 isn't particularly useful anyway.
In the pattern recognition classes that come from a Velo track PrUpstreamTracks and PrLongTracks we keep track of the Velo track by storing the indices:
I tend to agree that providing IPCHI2 information for things like downstream tracks is not terribly meaningful and with 6 PVs per event arguably analysts should be discouraged from using it; so you could even see not computing this information as a desirable feature rather than an omission.
One corner case worth keeping in mind is what happens if someone develops an algorithm to extrapolate downstream tracks back into the VELO looking for only 1-2 hits in the final modules precisely in order to get some information about where they came from. These tracks would not have a VELO segment but their IP information would not be meaningless either (though low resolution). Given that our new triggerless readout will enable order of magnitude gains in efficiency for kaon physics I don't see this as a totally abstract consideration.
It may therefore be worth thinking ahead when designing the structures and sketching how this information could be provided for tracks which do not contain a VELO segment. It would be unfortunate if we made choices here which later meant that we require some major refactoring in order to add such information to the selections framework. This could be written up as an issue and left there to help anyone who comes along to implement the relevant code later.
It might be too narrow-minded but to me the interface for analysts is the functors, for example:
# The standard PVs maker in Moore now also returns the extra-info structure being discussedpvs,t2pv=make_pvs()cut=IPCHI2(Vertices=pvs,TrackRelations=t2pv)>4.0returnParticleFilter(make_long_pions(),cut)
(This functor interface doesn't exist yet; better names on a postcard please. It assumes the functor is able to use the VELO-to-PV relations directly when given a long track.)
If one wants to cut on such a quantity for non-VELO'd tracks then in principle one 'just' needs an algorithm which produces the same t2pv structure but for the tracks in question.
# Don't need the VELO-to-PV associations herepvs,_=make_pvs()downstream_pions=make_down_pions()t2pv=make_downstream_pv_relations(downstream_pions,pvs)# Same as beforecut=IPCHI2(Vertices=pvs,TrackRelations=t2pv)>4.0returnParticleFilter(downstream_pions,cut)
One corner case worth keeping in mind is what happens if someone develops an algorithm to extrapolate downstream tracks back into the VELO looking for only 1-2 hits in the final modules precisely in order to get some information about where they came from. These tracks would not have a VELO segment but their IP information would not be meaningless either (though low resolution). Given that our new triggerless readout will enable order of magnitude gains in efficiency for kaon physics I don't see this as a totally abstract consideration.
Sorry, for the ignorance, but I don't follow the reasoning. While I agree with not closing doors too early, can you explain why having a few hits in the VELO changes the usefulness of the IPCHI2 to assign tracks to PVs? I can't immediately see how to use it for a PV association for these tracks even if the IP had a very good resolution.
I'm not sure there is a lot to explain -- having one and certainly 2 VELO hits improves your lever arm for extrapolating the track into the interaction region. As I noted the resolution wouldn't be very good but it isn't immediately obvious to me that it would be useless.
If you have such production far downstream of the beamspot the IPs with respect to all primary vertices should be very large; the track <-> PV association as a function of IP degrades as a function of the decay time of the parent. The resolution doesn't matter, but what do you want to do with it?
Clearly you know where the particles came from (and by extension, once you vertex them, their parent) far better than if you had no VELO hits at all. So it isn't out of the question to me that someone might still ask "which PV is most compatible with these tracks". I'm not sure I see why this is a controversial point.
You know already they are produced away from the beam spot, in order to be a downstream track with just 2 VELO hits? (you know their eta etc) To me it's not obvious the PV with the best IP is the PV where a particle should be associated with when the production vertex is so far downstream...
Anyways, it seems that I still don't see something obvious here, so let's not derail the discussion; saving more information to be safe is something I'm not against either. :-) It would be nice to see how effective it is of course...
I think I also haven't maybe been as clear as I could have. This is a corner case because the exact same arguments you make will apply to tracks with hits in the last 3 or 4 VELO layers, but because those do get a VELO segment they will have this information saved. You can logically say tracks with 0 VELO hits should always be analysed separately, but it isn't terribly logical to say "if you only have 1 or 2 this is fundamentally different to having 3" for exactly the reasons you give. Does that make more sense as a way of thinking about it?
addition to the PVs themselves, the finder should also store this per-track information:
Index to vertex.
Weight.
Impact parameter and
χIP2
.
Derivatives of
χ2
with respect to PV position.
There are actually two cases, and the above covers (mainly) the case where a track was used in a PV -- i.e. 'index', weight and derivatives assume that the track was used in the PV (albeit if a track isn't used, one can put the derivatives to zero, and still have a valid entry).
But I suspect that for the impact parameter and chisq_IP, I likely will want to use this for any track, wrt. to any PV, regardless of whether the track was associated to the PV. So for each track, I would expect one entry per PV.
Which then begs the question, should this not be in a separate data structure (that could be constructed at the same time, by the same algorithm, re-using work it would do to determine the PV).
Agreed that having the IP info for each PV is probably worth having just in case.
The functors can then remain explicit by being e.g. MINIPCHI2 (the smallest value found, i.e. that with respect to the 'best PV' by definition) rather than BPVIPCHI2 or just IPCHI2.
@graven Actually, not quite: The PV finder associates /every/ track to its closest vertex in the clustering procedure. It is with respect to that vertex that the IPCHI2 is computed. If the track was not used (IPCHI2 above a certain threshold value, used to compute the Tukey weight) the weight is zero. The unbiasing procedure takes the weight into account.
I guess that the only corner case is when there is not a single PV. In that case, the PV algorithm cannot compute the extra information for the tracks.
I am trying to understand how I could efficiently contribute here. The new TrackBeamLineVertexFinderSoA differs from the alg that I wrote in different ways:
improvements related to dealing with conditions
switch from one input container to two (one Forward and one Backward)
change the input container to the SoA container 'PrVeloTracks'
I must admit that I'll have difficulty implementing this the SoA way, at least, in a reasonably amount of time. Therefore, I considered to just change the non-SoA algorithm such that it becomes a temporary drop-in replacement of the new one, and then add the new code there. However, that doesn't quite solve it for me yet, because the input containers are now SoA as well.
Perhaps we could first agree better on the output. Will the framework that is going to use the IP-chi2 information (and unbiasing etc) also expect a SoA container as input, or doesn't that matter?
My feeling is that the integration will be simpler if the structure is already SOA. Specifically if the tracks are already SOA then they will be zip'able with an SOA 'relations' object.
The SOA base classes that @ahennequ worked on should allow for a fairly short implementation, given that it needs a handful of floats and float arrays. Having said that I'm not sure how best to get acquainted with this, perhaps @agilman or @mramospe have a few pointers for getting started?
May I know what exactly want to do with long tracks? In addition to the ones Alex mentioned, we have the output from the forward tracking and the matching, which use a different type.
Nothing ... I was just worried that I'll make the output structures SoA compatible, but then later there wouldn't be anything to zip it with.
Of course, my concrete problem is that although I know exactly what I want to do, I actually cannot read the code in LHCb::v2::Event::Tracks. So, there is going to be some learning curve ...
Hi Alexander, thanks a lot ... I am going to need your help at some point.
I have already one question. It has nothing to do with the SoA, but rather with the data content: Does each long/upstream/velo track know which Velo segment it belongs to? (That is, is there an index to the original track in LHCb::Pr::Velo::Tracks?) Because that is essential for the new functionality to be useful.
No, not in the current implementation. I think this is done in the LHCb::Pr::Forward::Tracks as m_forward_ancestors in the constructor, but as the LHCb::v2::Event::Tracks can represent both Velo and non-Velo tracks, this warrants some discussion on our end on a proper implementation @apearce@mramospe. Maybe just adding additional constructors as in LHCb::Pr::Forward::Tracks, and restricting the output of the ancestors for only Track Types with VELO segments?
Indeed, the Pr:: track types all carry their ancestor information (by storing indices to "point" to the ancestor tracks in the respective container).
However, I think, v2::Tracks need to do the same thing for all possible track types. At the moment (in the fastest scenario), we only fit long and downstream tracks, but eventually all track types must have a representation as v2::Tracks and therefore all possible ancestors must be implementable.
Tracks with ancestors are:
long: Velo or Upstream (from the forward tracking) and Velo or Upstream and T tracks (from the matching)
I see, I guess this makes things a bit easier then as we don't have to implement any branching based on type. Should be fairly quick to implement in the same way as in the Pr:: tracks, just by storing the pointers.
Thanks Alex ... in that case I'll just adapt the SoA version directly. I think that there are some examples of creating new containers inside the code itself, so I'll try to use that first.
Sorry, very little progress on my side yet. Too much velo right now ... I'll try to make some progress this week, such that I can have a better idea for how much longer it would take.
I started to work on this and after reviving my monitoring algs realized this: In the implementation of TrackBeamLineVertexer as it exists in Rec now, the relation between PVs and tracks has gotten completely lost! When the algorithm was migrated to use the velo soa container as input, the output (a RecVertex) was not modified. Since the RecVertex contains Track pointers, it now just has zero pointers:
This is (unfortunately) correct and since quite some time is on the list of things to implement, but I think nobody had time so far.
Essentially we need to change the RecVertex class (the v2) version to accept indices to point to the track containers (as we do for the ancestors in the pattern recognition).
There has been a discussion about this a long time ago, but it was never implemented.
OK, thanks! What I will do is write a new output container for BLVF and then make a conversion algorithm that takes as additional input a Track list and creates the RecVertex container. That way at least everything downstream will work the way it used to be. We can then later decide to adapt RevVertex as well.
I think that was pretty much the plan as well (create a RecVertex variant with indices and then convert it back (using the known containers of v1::Tracks that one can have a pointer to) to the usual format for the checkers).
I have started drafting SoA versions of the PrimaryVertex and the PrimaryVertexTracks. (The latter will hold the track dependent data.) However, I wonder to what extend it is useful to make these SoA containers.
For instance, the essential data of PrimaryVertex object is about 100 bytes, so even for all PV objects together it is less than a kByte. Are we sure it makes sense to make this a SoA container rather than a standard vector?
Then the data for the tracks is used to provide an unbiased set of vertices given a small subset of the tracks (e.g. the few tracks in a particular decay for which we need an unbiased PV to compute a decay time). The function call with look something like
compute_unbiased_PVs( PVdata, { set of track indices } )
I wonder whether with this kind of access pattern the SoA structures are actually useful. We typically need all data for a small random subset of tracks, rather than a subset of the data for all tracks.
I have made a data structure that fills a 'SoA' structure for the tracks that could eventually be zipped with the original track input container. It contains an index for each track to the closest PV, the weight in the PV fit, the IPCHI2 and some other data needed for the unbiasing.
The vertices themselves are stored as an AoS. Apart from the position and covariance matrix, they contain (little) additional data needed for unbiasing.
The two outputs of the PV finder are combined into a single data structure:
which is the output of the algorithm. The motivation to put them together, rather than as a tuple, is that in practice they are probably not very useful separately.
I added a converter that translates this into v2::RecVertices with the links to the tracks. I used this to verify that the vectorized version has very similar performance as the original non-vectorized version which is still in the package.
I also added a routine that allows to 'unbias' the list of PVs relative to a set of tracks.
There are many open issues:
the data structure isn't actually visible from outside the package yet.
The PV finder now has two input tracks list (velo forward and backward) while inside the algorithm there is only one track list. The output SoA structure for the tracks now contains both forward and backward (in that order), just like for instance the v1 and v2 track lists. It means that you cannot actually zip these with the input track container yet.
this version of the PV finder is about 30% slower than the original SoA version. I think that there are various contributions to this:
To make the ipchi2 computation more consistent I had to add a bit of computation for the first iteration of the vertex fit. Then to simplify it I just did this for every iteration, which takes time.
The maximum number of fit iterations was set to 5, but I found it to be not quite enough.
The original implementation had the tracks organized by vertex. In the new implementation the list of tracks keeps the original order and all vertices are fit simultaneously. This works nicely, but there is a small part I don't manage to vectorize. It is a trivial addition of numbers that now dominates the vertex fit. (See my post at mattermost.)
I still believe that we'll be better off refitting the PVs with fitted tracks. If those are the tracks for which we need the unbiasing, then it actually makes little sense that the PV finder already stores this data: It needs to be re-computed anyway.
I am also not convinced thet it makes much sense to have a SoA structure for the tracks at the output of the algorithm: in practice, the unbiasing is always done for just a few tracks at a time. It will always be via scalar access across multiple columns in the structure. Both the simd and the SoA are in the way at that point: A scalar AoS would in theory use the memory less fragmented. One possible solution is to provide the output in both formats (by writing another converter).
You know what I will say about more converters -- we need fewer data structures, not more structures and more converters. But I'll let @decianm comment on this.
I'm confused by something else, I thought it was agreed that for monitoring and similar the PV would itself maintain indices to the tracks used to make it. Or is this PV class not meant to be used for those purposes without conversion?
We can certainly add a vector of indices to the vertices: That would essentially make 'PrimaryVertex' identical to 'RecVertex'. (I would still keep it a separate class, because of persistence issues etc, but that's a matter of taste.) It would make it a little more complicated to write the PVs as a SoA, if we would ever want to do that.
My hesitation with this is that I cannot really think of a use case:-) In every place where you do something with tracks and PVs you'll probably do this for all PVs. If you make plots for a single track versus a single vertex (e.g. plotting a signed 1D-impact parameters, of which we use a couple of different once to monitor velo deformations), then track->PV is as sufficient a relation as PV->track.
Perhaps the only exception is when you want to plot the number of tracks associated to a PV, given a certain condition, like the track being 'backward'. If you have PV->track association you could write the loop as
for (pv : pvs) { int n(0); for( trackindex : pv.tracks ) if( CONDITION(alltracks[trackindex]) ) ++n; histo->fill(n); }
while without the relation you would need to do something like
It is slightly more complicated, but perhaps not enough that one would add it to the event model. You can always create the PV->track relations on the fly, if you need them.
(I had also hoped that the second case can be easily vectorized, but this is exactly the kind of code that I do not manage to vectorize in the PV algorithm.)
Thanks for the detailed explanation! I was just under the impression that there was a use case, and that this was discussed and agreed. But of course I am coming at this 2nd hand and I may have misunderstood or been given incomplete information. I'll let @decianm comment.
Apologies to come back to this a little late (and I did not have a look at the corresponding MR):
The classical use-cases I had in mind was a track that is assigned to multiple PVs, the monitoring you mention and the unbiasing, where one usually only wants to unbias one PV. I see that one can always do both for both ways of the relation, but for these things PV -> tracks seemed easier. It would be good to see what the penalty is when introducing both directions (aka adding the list of tracks and making PrimaryVertex similar to RecVertex).
Indeed, let's please limit the number of event model classes and converters. Converters are needed for development and checking and potential special use cases, but we should not design things with them in mind. Also, we have to make sure things are zippable, and therefore we cannot mix (I think) SoA and AoS.
Unfortunately @adudziak did not have time to come back to the Velo tracks vs. fitted tracks PV discussion (I believe), but I think we are still missing numbers for the comparison. And of course there are more arguments like consistency, but maybe we should not discuss this in an MR.
How do we deal with the 1 vs 2 input / output lists for the forward and backward tracks from the Velo pattern recognition and the PVFinder. Have the Velo only give one output (merge forward+backward), with a split-point-index? Have the PVFinder give 2 track lists, split between forward and backward, such that it can be zipped with the forward Velo tracks? Do we ever need to zip it?
What data structure is the best for the output data from the PVFinder? SOA or AOS? This depends on the most likely access pattern in the selections / the PV-unbiaser. Mostly affects WP3 I assume. @poluekt@mvesteri
Can we upgrade the SOACollection to have a SOA-AOS layout? @chasse@ahennequ
What other (exposed) data structures do we need? Can we please have as little as possible duplication and conversion?
Physics:
Does the "old" PVRefitter work, or do we need a new one? How much does it cost to keep the PV -> tracks relations (using a small_vector to keep the allocation cost minimal?)? Do we need to be able to re-fit the PV from scratch (in contrast to subtracting the contribution of single tracks)?
Can we implement a (near) identical PV finder for HLT1 and HLT2? How do we check and deal with possible differences? @freiss
Can we confirm that using Best tracks has no advantage to using Velo tracks? @adudziak
Why are there more outliers after track fit and why around 3% of the velo tracks are lost in the track fit / TrackBestTrackCreator?
Thanks a lot for the summary, let me add an open point (with apologies if I misunderstand the slides, I was moving house last week and only looked at them now): I don't think we want to bake in that track->PV relations are simply "the best" PV for each track. We know that misassociation will be a bigger issue in Run 3 pileup and we cannot be certain how much bigger until we see data with the new detector. On top of that the HLT1 PV finder is by construction more stochastic since tracks contribute to PVs with a weight, there is no explicit selection of tracks belonging to a PV. If we want to use the same structures to at least study the performance of this compiled for CPU, we should allow the possibility of multiple associations. I'd be curious to understand how much performance is actually gained by baking this in.
The multiple associations is a good point (and also why I was motivated to keep the PV -> track relations, as there a track could appear for several PVs).
I assume for both algorithms tracks contribute to the PVs with weights (isn't this the Tukey weight?) and I was under the impression this fitting is very similar for both algorithms. What I am not sure is if there is a cutoff, or the weight is simply so small it is essentially 0 and if this has any practical consequence, apart from selecting in the end the PV with the lowest IP-chi2 as the "best" PV. But this is something @wouter should comment on.
The Tukey weight has an explicit cutoff: above a certain IPchi2 the weight is exactly zero. In the current CPU algorithm tracks are simply assigned to the closest PV seed, after which the fits runs on the selected tracks. In the 10k events that I looked at, I found only a handful of tracks for which the chi2 to another vertex was smaller after the vertex fit. These were tracks with such large parameter errors that you could have assigned them to either vertex. They do not contribute to the position of the vertex.
My take on data format is that the output of an algorithm can be the structure most useful to the algorithm. A separate converter can be made to change into a common structure: we do not gain anything by putting the conversion step inside the algorithm. The output of the alg that I committed contains all the data the algorithm can produce in a format that is reasonably efficient to the internals of the algorithm. It is also good enough to be zipped with the original track container. If we want another format, I'll write another converter.
sorry for the late reply. I am just catching up on this topic.
Can we implement a (near) identical PV finder for HLT1 and HLT2? How do we check and deal with possible differences?
It should be possible to compile Allen for CPUs and run its PV finder in HLT2. A while ago I studied the differences in physics performances between Allen compilations on GPU and CPU and did not see any worrying difference. There are some differences in performance between the Allen PV finder and TrackBeamLineFinder, which maybe can be reduced by tuning the parameters of the algorithms.
Alternatively, one could try to translate the Allen implementation to C++ code or try to make the Allen implementation closer again to TrackBeamLineFinder. Let me stress that the main difference between the PV fitting in Allen and TrackBeamLineFinder is that in the Allen implementation each track can contribute to each PV fit with an adjusted weight (with some cutoff). If the track is only close to one PV candidate the weights in Allen should approximately be equal to the Tukey weights used in TrackBeamLineFinder. I have never actually studied how different the weights in Allen are compared to the 'traditional' Tukey weights. Might be interesting to have a look at.
Let me ask the question here if it will also be necessary to add track-PV or PV-track relations to Hlt1. I believe for the Allen case, it would be easiest to add those relations in a separate step instead of including it in the output of the PV fitter.
We certainly don't need to worry about "unbiasing" the HLT1 primary vertices, since they are not what will be used for the eventual decay-time measurement in the physics analyses. What is important is to be able to store (or calculate later) relations between any candidate which causes HLT1 to fire and the PV associated with that candidate, if the selection line in question used cuts that depend on decay-time. Those are indeed best computed either during the selections themselves or in an 'afterburner' step at the end.
@freiss It would definitely be interesting to know how different the Allen weights are from Tukey weights. I have tried alternatives for the Tukey weights, but could not find anything that works better. If there is, we would really like to use it in HLT2 as well, of course.
I don't know what is the most important for HLT2 PVs, but to first approximation I would say that they just need to be the HLT1 PVs, but with improved' parameters due to other alignment conditions. Is that so? Because if that is the main difference, then it would actually be very nice to make HLT2 as close as possible to an HLT1 refit'. One easy solution to that is to use the Allen vertices as seeds:
take the Allen PV positions, and the HLT2 tracks (velo? fitted?)
partition the tracks over the Allen seeds
fit
Step 2 and 3 wouldn't need to change from the existing HLT2 alg, so this is really easy to implement.
The advantage is that we do not loose any PVs. The disadvantage is that we cannot `improve' the PV finding efficiency or separation anymore, and that we rely on Allen having run before.
We can also make the Allen input 'optional', by adding the Allen seeds to the standard seeding, but that definitely complicates things: we would need to define a criteria to decide if seeds are equivalent.
(On an `entirely different but similar' topic: I (and Gerhard) think that it would also be good to start the track reconstruction from the Allen output, or at least revive the Allen tracks that are in the selreports. That way we prevent hard-to-predict tracking efficiency losses due to tiny differences between HLT1 and HLT2 conditions.)
I would really appreciate also input from @mvesteri and @poluekt on the original question about the format (SOA or AOS) that would be best for the selections and from @adudziak on the question about best tracks vs. Velo tracks.
It would be good to know the "access pattern" of the PVs, i.e. do you often need to access the same property of multiple PVs (i.e. x, y, z position) or you only ever access one PV at the time in any case?
Indeed, we are not at a stage where we can technically benchmark this.
Also (and I might be talking nonsense here), do we ever need to zip the PV to anything?
The MinimumImpactParameterChi2 functor calls a more complicated access pattern in the ipchi2 calculator with various branches. I don't really know which branches are more/less likely, or even if some of them will never happen in practice.
The branches in impactParameterChi2 are constexpr so they are evaluated at compile time. From what I remember, the MinimumImpactParameterChi2 functor vectorize over tracks and simply load vertex one at a time.
In the code you have shown above, the vertex are read one at a time and the vector_t( PV.x(), PV.y(), PV.z() ) part is broadcasting the value from a single PV into a vector to be matched with multiple tracks. At the end it returns the minip2 for all tracks in the simd vector.
Ok, thanks @ahennequ, so this broadcast of the PV 3-coords to the same (I'm guessing SIMD) type as the state position is something that could be avoided by upgrading the PV event model to SOA?
SOACollections have a scalar_fill view which is equivalent to this broadcast behavior. But I wouldn't say we would gain any value in switching to that compare to what we have now (for this particular functor).
As we don't have many PV I don't think it would make a big difference whether it's SOA or AOS (as we access them one at a time and the whole container is small enough to fit in L1 cache).
If the motivation to use SOACollections is to have a common interface for all collections, perhaps the best is to make the switch and add a template parameter to change the internal representation (SOA/AOS) of collections directly in the library. That way we could iterate faster and test things with a single parameter change, while keeping the same interface.
perhaps the best is to make the switch and add a template parameter to change the internal representation (SOA/AOS) of collections directly in the library.
I mean add an extra Layout template parameter to SOACollections https://gitlab.cern.ch/lhcb/LHCb/-/blob/master/Event/EventBase/include/Event/SOACollection.h#L88
That allows to change the internal representation (like the backends of SIMDWrappers). This would allow to keep the same interface (zip, proxies, etc..) but with different memory layout. The user would still have to manually specify the layout but as it would be a single template parameter change we could experiment with different versions and take the best.
So we can use SOACollections now and then add the ability to use AOS (might need to rename the library to simply Collections) with them.
Aside from anything else, there is a big conceptual difference between partitioning the tracks in PVs or not partitioning them. It is not obvious at all to me which will lead to smaller long-term systematic effects on acceptances or resolutions. I think this deserves some deep thought and investigation, so any decision we take is something we should be prepared to rethink based on evidence. I feel strongly that we should commission both ways and study both in data.
This is not really answering the question, but I would think priority at the moment should be the put a PV finder in HLT2 that stores all necessary information used in the selections and the PV unbiasing. If this algorithm then takes the HLT1 seeds and partitions the tracks over them or not is something that can be studied and changed by changing the algorithm behaviour, but as long as the rest of the chain stays the same, we at least know that we can test and use both solutions.
The current model of RecVertex just has pointers of tracks to vertices, so if this is just about testing something, then the functionality is already there.
I cannot revise the data structure that comes out of the current HLT2 PV finder, but I can provide converters to somebody elses favourite structure.
That said: track and vertex pattern recognition /is/ partitioning! Tracks that have equal weight to two vertices do not help to separate the vertices, nor do they help in vertex resolution. So, I would really be interested to see a plot of the ratio of the 'second largest weight to first largest weight' for the HLT1 weights :-) (I will myself try to make that plot for the HLT2 vertices as well, but I have some trouble with my lhcb stack today.)
@wouter A question that I forgot to ask: How useful are the ip-chi2 and the ip if we use Velo tracks for the PV, but then want these quantities for long tracks? Most likely it will change, so one would need to recompute it again? Does it then make sense to store them in the first place?
Yes, ip-chi2 needs to be recomputed for other versions of the track, but there is no harm in keeping it: it's a marginal amount of extra data. It is needed inside the vertex fit. It would take more time to remove the data than to keep it. There is no need to persist it though, since it can also be recomputed from other information. (We need to look a bit more carefully at the persistence once we are ready to implement that.)
@wouter by saying it is partitioning you are assuming the answer. There is a conceptual difference between allowing all tracks to contribute to all vertices with a continuously decreasing weight or putting a hard cutoff. I claim that nobody today knows long-term which choice is better because none of us have real experience with working in a high-pileup environment with a forward detector. We should not make decisions from first principles, we should focus on building a framework that will allow us to eventually answer which approach is better.
As far as not revising the structure from HLT2: why couldn't you modify it?
The data structure is currently all the data the algorithm produces: every track has an index to the closest vertex, the one it was used in for vertex fitting. There are no vertex to track links. If we need those, they can be added by a converter, to a different data structure. The latter is currently 'RecVertex' and I already wrote a converter to fill that. If something else is needed, I am happy to write another converter. But I don't see why I would add non-trivial data (namely of non-constant size, so expensive) to the output of the algorithm.
I think it is very much preferable to avoid converters except where strictly needed. And in HLT2 the PV timing cost is neither here nor there, so there is no need to maintain one super-optimized data structure in one place and another one accessed by a converter elsewhere. That adds complexity and maintenance cost for little practical benefit. We should be trying to actively merge the data structures and eliminate converters, not make new data structures and add new converters.
Well, I don't think it makes sense either to put a converter inside an algorithm if you can also put it outside it.
The data structure as it is is completely usable for what it is for: you have the vertices, and you have the data, and even function calls to do unbiasing. It is actually not highly optimized: it just uses the SOACollection as it exists (which as I have pointed out elsewhere, comes with its own penalties.)
Before doing anything else, I would like to see the use case.
If we do not unbias vertices, then we don't need any track-vertex relations at all, other than for monitoring.
If we do unbiasing, and would like to do that efficiently, then we need to have navigation from track to vertex, not the other way around: you cannot afford to look up a track in the list of tracks associated to each vertex.
So, we need track-to-vertex relations, period. Now, suppose yo would like to allow every track to contribute to every vertex. Then you need to keep for every track a list of weights, with a length that varies from event to event. It adds serious complexity to the data structure. Therefore I think that it is allowed to question if we need this. My claim is that we do not: you cannot have a single track contribute with real information to the position of two vertices.
It is allowed to question if we need this, of course. But as we will already have this in HLT1 (where performance is a relevant issue) the real question is not whether you will have it, the question is how you will most efficiently deal with it. And then it is obviously simpler to have a single PV structure in HLT2 which can be used to store the output of both the HLT1-style and HLT2-style PV finding, rather than having extra converters.
I repeat again also my more "management" point: this is not the right time to be doing conceptual studies. Right now the focus should be on framework functionality. We can study the physics during commissioning.
Saying that and keeping in my mind the all experience with Run2 PVs and re-fitting vs re-finding needs from some TD analyses which existed in Run1 but where not needed in Run2, I am in strong favour of keeping the HLT2 PVs as close as possible to HLT1. Preferably by compiling Allen PVs in CPU version.
https://gitlab.cern.ch/lhcb/Moore/-/blob/master/Hlt/RecoConf/python/RecoConf/hlt1_allen.py
If there any technical issues with this approach, I would be in favour if somebody could explain them to me, as I am not really an expert.
Why not make use of them, and do some further studies?
In principle the PrimaryVertexChecker was already used for the CPU/GPU document comparison, so with some changes, with could get information about average weights or whatever Wouter would like to study as a comparison between the two.
Finally, saying that right now Allen PVs indeed do not make 1 to 1 connection with PVs, to reduce any possible hiccups with converters and possibly do not introduce biases ideally would be to have only one implementation.
Ah, I realized now that indeed Allen runs the whole HLT1 and not only PVs, so isolating one reconstruction might be not trivial. But this does not change the general point of having as similar reconstructions as possible in both HLTs.
I completely agree that we can just use the Allen PVs in HLT2. However, AFAIU we are not going to actually run Allen algorithms in HLT2, because that wouldn't be fast enough. Please correct me if I am wrong. However, we should definitely consider reading Allen PVs from the output of HLT1 and just performing a refit. See higher up in this thread.
About the weights: I would be interested to see how tracks contribute to their 'second best' vertex: first just the weight itself, then a quantity that expresses how much the track contributes to the entire vertex, for instance 'weight* chi2 / sum_tracks_in_vertex weight * chi2'. I think that we will find that the second best weight are much smaller, and if they are not much smaller, then the track just has such larger errors that it fits to any vertex, and if that's not the case, then we have just reconstructed the same vertex twice.
Because of this, I also suspect that the difference in fitting strategy is not really the issue: What matters is how the PV seeds are found. If the Allen alg is better than the HLT2, we should use Allen's approach, and vice versa. And if the Allen seeds are good enough, then it would be really best to do no PV finding in HLT2 at all, but just a refit with tracks 'found/fitted' with the latest conditions.
I don't think there is any throughput problem with running the Allen algorithm in HLT2 for PVs, but we would need to be able to run only that, not have to take the whole HLT1 "as a package". Not so much for throughput but because it's just inelegant.
Ok, so I think that we are looking at the same direction (which is already an improvement!) just still from slightly different angles.
My concerns go actually other way around as the main difference between the two algorithms is implicit/explicit track association and as a consequence different fitting strategies. For the seeds the GPU version should mimic CPU version, however I am aware here we have different set of input tracks as the VELO CPU/GPU algorithm also varies, so small differences are, I would say, expected.
I would like to explore the idea of using Allen CPU version, as Vava pointed out there are no showstoppers from the throughput point of view. Could any of our GPU experts (@dovombru@raaij@dcampora ) maybe comment how difficult is to isolate VELO + PVs part from GPU HLT1? And if there are obvious blockers?
As far as Allen CPU version -- please take note of the developments in Allen!431 (closed) which automatically generates Gaudi Algorithm wrappers around Allen algorithms in order to be able to run the same source code also inside Gaudi on CPU, and in addition be configured coherently in PyConf. So we should very much be able to take parts of Allen (but of course we have to consider also all the inputs that the PV finding needs, not just the PV finding itself) and run then 'naturally' inside Hlt2.
Sorry, but I have to intervene here.
We can all study this and it is certainly interesting (and important), but given how long this issue is already open without much progress or agreement, how long it took to get the HLT1 SelReports in, I simply need to stress: We need a solution that solves the original problem of associating tracks to the PV first (and most importantly). If we then use HLT1 PVs from Allen-on-CPU, resurrected from some PVReport or we make them from scratch is a different issue. Also, it seems to me that we need to keep the possibility to have PVs in HLT2 without having anything from HLT1 to start with.
So let's please use next week to solve this problem first and afterwards see how we actually get the PVs in HLT2 in the first place...
Yes @decianm I fully agree, we probably diverged too far. However I feel that with discussion about HLT2 we try to converge on implicit/explicit track association for PVs... where we still disagree.
Perhaps we disagree, but there is a working solution now for HTLT2, in the merge request linked above. The code also includes a converter to the existing RecVertex, which already has the PV to Track linking that you request. There is no CPU or complication spared by solving this with an explicit conversion inside the algorithm. I propose that the implemented solution is used for further development next week: It still needs a bit of work since the new event classes are not yet visible outside the package.
If the proposed solution falls short in terms of future requirements, it will not be complicated to change this later, especially if we keep interfaces the same.
I would propose that we agree to this as a stopgap solution and open an issue where we clearly state that the converter should eventually be removed in favour of having a single class with all information in it. That seems to me the only pragmatic way forward. I very strongly do not want a development model which explicitly favours external converters and multiple classes to hold the same logical information, but as I say in this short term to move things forward I can live with this stopgap. If this is not acceptable @wouter I think we need to organise a dedicated verbal discussion next week.
We keep the original Run 2 approach, which would require changing the v2 Rec::Vertex class.
Any updates after today's meeting on that (@peilian@wouter)? If I understood correctly, the question is if the chi2 values of the track-to-PV relations are ever used or not (given that these are Velo tracks, I assume they cannot be used for the long tracks that result from these Velo tracks). I think we should decide on one of the two approaches until the end of the year.
Unbiasing has to be possible online and offline, and the structures that are needed for it have to be persisted.
The B-candidate to bestPV relations have to be created and persisted. This is more something for @mvesteri@poluekt
I am not sure what you mean with the original run-2 approach. Do you mean that we store PV->track relations instead of track->PV relations? We can store PV->track in addition, but for efficient unbiasing/refitting we need track->PV in any case.
Ok, maybe I then simply misunderstood.
Yes, with "Run2 approach" I meant that we only store the PV->track relations. But if we need the track->PV relations in any case, I think the question above is settled :)
@decianm@wouter@peilian I'm catching up with this after the holidays and for the record I am still not convinced by the arguments for dropping PV->track relations. In particular if we are moving towards a model where everything is a relation then I don't understand what harm it does to make this particular relation bidirectional. The benefits of having track->PV relations are on the other hand clear to me.
There is a meeting tomorrow morning (10:00), but this might be a bit too tight (?)
If it is a purely technical discussion, we can also set up meeting only with the people involved.
It may help to set some priorities. Up to now I could do everything 'standalone', but the next steps require understanding how things are used :-) Perhaps it is best to start with the relations and persistence. I'll collect a bit of information for the meeting next week.