Skip to content

Migrate from Rich to LHCbMath SIMD Types and Maths headers

This MR migrates from RichUtils some developments I made for SIMD types, which are not RICH specific.

  • LHCbMath/SIMDTypes.h Defines a number of SIMD types (currently based on the Vc library).
  • LHCbMath/FastMaths.h Defines a number of fast math functions for SIMD types, inspired by the VDT scalar versions. It also adds some very fast but more approximate versions for both SIMD and scalar float types.

I have updated the VDT QM test to include the SIMD types.

I have also added a simple timing algorithm to show the speed of the various implementations. For AVX2+FMA :-

pcmf ~ > TestMathSpeedAVX2FMA.exe 
Generating 96000 data values
testing exp functions...
Scalar STD  exp  2378012
Scalar VDT  exp  1100729
Scalar APX  exp  950546
Scalar VAPX exp  467608
SIMD   STD  exp  1379771 ( 172471 per scalar, SpeedUp 13.7879 )
SIMD   VDT  exp  1012937 ( 126617 per scalar, SpeedUp 8.69337 )
SIMD   APX  exp  1107463 ( 138432 per scalar, SpeedUp 6.86648 )
SIMD   VAPX exp  639107 ( 79888 per scalar, SpeedUp 5.85327 )
testing log functions...
Scalar STD  log  1976934
Scalar VDT  log  1553670
Scalar APX  log  551338
Scalar VAPX log  328840
SIMD   STD  log  1667942 ( 208492 per scalar, SpeedUp 9.48203 )
SIMD   VDT  log  1378560 ( 172320 per scalar, SpeedUp 9.01619 )
SIMD   APX  log  777191 ( 97148 per scalar, SpeedUp 5.67519 )
SIMD   VAPX log  592706 ( 74088 per scalar, SpeedUp 4.43849 )
testing sqrt functions...
Scalar STD sqrt  446829
SIMD   STD sqrt  330341 ( 41292 per scalar, SpeedUp 10.821 )
testing 1/sqrt functions...
Scalar STD isqrt 1065124
Scalar VDT isqrt 981662
SIMD   STD isqrt 859815 ( 107476 per scalar, SpeedUp 9.91026 )
SIMD   VDT isqrt 1069435 ( 133679 per scalar, SpeedUp 7.34341 )
testing sincos functions...
Scalar STD sincos 2324305
Scalar VDT sincos 2400355
SIMD   STD sincos 3311407 ( 413925 per scalar, SpeedUp 5.61527 )
SIMD   VDT sincos 1631723 ( 203965 per scalar, SpeedUp 11.7684 )
testing atan2 functions...
Scalar STD atan2 3101764
Scalar VDT atan2 2541980
SIMD   STD atan2 2891748 ( 361468 per scalar, SpeedUp 8.58101 )
SIMD   VDT atan2 1976419 ( 247052 per scalar, SpeedUp 10.2892 )
testing asin functions...
Scalar STD asin 1699003
Scalar VDT asin 1241106
SIMD   STD asin 1537802 ( 192225 per scalar, SpeedUp 8.8386 )
SIMD   VDT asin 983377 ( 122922 per scalar, SpeedUp 10.0967 )

Where STD refers to either the STL implementation (for scalars) or the Vc one (for SIMD types). VDT is either the upstream library (for scalars) or my implementation of them (for SIMD). APX and VAPX refer to the faster, approximate and 'very' approximate versions. In all cases the based scalar type used is float. The numbers are the timings (smaller better) and for the SIMD types I show as well the speed ups w.r.t. the equivalent scalar implementation.

The approximate log and exp methods are significantly faster than the VDT or STL versions, but the differences w.r.t. the STl standard are larger, so should be used with care. However, in the right use case (a real hotspot where absolute precision is not essential) can be very useful. I use the 'very approximate' log form in the RICH likelihood minimisation and it gives a decent boost...

Edited by Marco Cattaneo

Merge request reports