Support SIMD/POD containers and vectorised selections in new functors (!1601) · Merge requests · LHCb / Rec

Olli Lupton requested to merge olupton_vector_functors into master Jun 21, 2019

This is a sketch of a way that vectorised selections on the new SIMD-friendly POD track containers (e.g. ::LHCb::Pr::Velo::Tracks) could work with the new functors introduced in !1541 (merged). This MR:

Introduces "proxy" iteration over the POD track containers (cf. LHCb#37 (closed)), although (unlike the sketch there) here a proxy represents "a vector-unit-sized chunk of tracks", not "a track". This is done with a wrapper around the ::LHCb::Pr::Velo::Tracks type. Integrating it directly into ::LHCb::Pr::Velo::Tracks is also an option...
Allows functors to filter ::LHCb::Pr::Velo::Tracks into a new ::LHCb::Pr::Velo::Tracks (like PrFilterIPSoA)
Makes the MINIP, MINIPCUT and ETA functors work when filtering these containers, preserving @ahennequ's vectorised calculation in PrFilterIPSoA. The definitions of these functors are not explicitly specialised, rather the same function body is reworked to be valid for both scalar and vector types.
Makes the implementation choice that different [chunk-of-]track-like objects/proxies should provide accessors named closestToBeamStatePos and closestToBeamStateDir. The rationale here is that these return basic mathematical objects (i.e. 3-vectors), rather than higher-level concepts like track states, so the number of types and operations that need to be implemented (and consistently named...) for the concrete types (here @ahennequ's Vec3<T> and Gaudi::XYZVector) is limited. See also: this vision page on track interfaces.
Instantiates Pr::Filter<T> as PrFilter__PrVeloTracks, to be used as a replacement for PrFilterIPSoA. My tests were not exhaustive, but I did not see a significant change in speed -- this is essentially the same algorithm, but hidden behind an extra abstraction later (👎) and with a short-circuiting "optimisation" (maybe 👍, not conclusive but inherited from the scalar code).

There are still some problems:

This works when run from the functor cache, but when the functors are just-in-time compiled then Cling does not set the right preprocessor macros, so the SIMDWrapper types are just-in-time compiled as if avx2 is not available (I use x86_64+avx2+fma-centos7-gcc8-opt+g). cc: @clemenci.

And some things that are not implemented:

Proxies for upstream and forward tracks
Short-circuiting for combinations of vectorised cuts (I think...)

Goes with LHCb!2004 (merged) and Brunel!830 (merged).

cc: @ahennequ @sponce @ibelyaev @apearce @sstahl @graven

Edited Jul 02, 2019 by Marco Cattaneo

Support SIMD/POD containers and vectorised selections in new functors

Merge request reports