vectorized trackbeamlinevertexfinder
37% speedup, maybe there is more to get out of this, but not without significant effort i believe. A large portion (the histogramming, peak search, sort tracks, make v2::recvertex
) is still scalar.
i just added the code in another file right now, we obv. to get rid of duplication at some point, but still need to clean up a little.
EDIT: thanks to the good eyes of @ahennequ, some remaining bottlenecks could be spotted, increasing the speedup to 47% (with n% speedup, i mean (100-n)% remaining execution time)
EDIT2: the calls to recvertex.addToTracks
at the end of operator()
alone take about 20% of the total execution time currently.
This code has been tested to (nearly exactly) match the reference TrackBeamLineVertexFinderSoA.cpp
in covariance, chi2, used tracks, number of vertices and some other parameters.
Minimal differences (least significant digit in a float) come from the fact that i add some floats in the vertex fit in a different order.