Skip to content

Support SIMD/POD containers and vectorised selections in new functors

Olli Lupton requested to merge olupton_vector_functors into master

This is a sketch of a way that vectorised selections on the new SIMD-friendly POD track containers (e.g. ::LHCb::Pr::Velo::Tracks) could work with the new functors introduced in !1541 (merged). This MR:

  • Introduces "proxy" iteration over the POD track containers (cf. LHCb#37 (closed)), although (unlike the sketch there) here a proxy represents "a vector-unit-sized chunk of tracks", not "a track". This is done with a wrapper around the ::LHCb::Pr::Velo::Tracks type. Integrating it directly into ::LHCb::Pr::Velo::Tracks is also an option...
  • Allows functors to filter ::LHCb::Pr::Velo::Tracks into a new ::LHCb::Pr::Velo::Tracks (like PrFilterIPSoA)
  • Makes the MINIP, MINIPCUT and ETA functors work when filtering these containers, preserving @ahennequ's vectorised calculation in PrFilterIPSoA. The definitions of these functors are not explicitly specialised, rather the same function body is reworked to be valid for both scalar and vector types.
  • Makes the implementation choice that different [chunk-of-]track-like objects/proxies should provide accessors named closestToBeamStatePos and closestToBeamStateDir. The rationale here is that these return basic mathematical objects (i.e. 3-vectors), rather than higher-level concepts like track states, so the number of types and operations that need to be implemented (and consistently named...) for the concrete types (here @ahennequ's Vec3<T> and Gaudi::XYZVector) is limited. See also: this vision page on track interfaces.
  • Instantiates Pr::Filter<T> as PrFilter__PrVeloTracks, to be used as a replacement for PrFilterIPSoA. My tests were not exhaustive, but I did not see a significant change in speed -- this is essentially the same algorithm, but hidden behind an extra abstraction later (👎) and with a short-circuiting "optimisation" (maybe 👍, not conclusive but inherited from the scalar code).

There are still some problems:

  • This works when run from the functor cache, but when the functors are just-in-time compiled then Cling does not set the right preprocessor macros, so the SIMDWrapper types are just-in-time compiled as if avx2 is not available (I use x86_64+avx2+fma-centos7-gcc8-opt+g). cc: @clemenci.

And some things that are not implemented:

  • Proxies for upstream and forward tracks
  • Short-circuiting for combinations of vectorised cuts (I think...)

Goes with LHCb!2004 (merged) and Brunel!830 (merged).

cc: @ahennequ @sponce @ibelyaev @apearce @sstahl @graven

Edited by Marco Cattaneo

Merge request reports