Migrate from Rich to LHCbMath SIMD Types and Maths headers
This MR migrates from RichUtils
some developments I made for SIMD types, which are not RICH specific.
-
LHCbMath/SIMDTypes.h
Defines a number of SIMD types (currently based on theVc
library). -
LHCbMath/FastMaths.h
Defines a number of fast math functions for SIMD types, inspired by theVDT
scalar versions. It also adds some very fast but more approximate versions for both SIMD and scalarfloat
types.
I have updated the VDT QM test to include the SIMD types.
I have also added a simple timing algorithm to show the speed of the various implementations. For AVX2+FMA
:-
pcmf ~ > TestMathSpeedAVX2FMA.exe
Generating 96000 data values
testing exp functions...
Scalar STD exp 2378012
Scalar VDT exp 1100729
Scalar APX exp 950546
Scalar VAPX exp 467608
SIMD STD exp 1379771 ( 172471 per scalar, SpeedUp 13.7879 )
SIMD VDT exp 1012937 ( 126617 per scalar, SpeedUp 8.69337 )
SIMD APX exp 1107463 ( 138432 per scalar, SpeedUp 6.86648 )
SIMD VAPX exp 639107 ( 79888 per scalar, SpeedUp 5.85327 )
testing log functions...
Scalar STD log 1976934
Scalar VDT log 1553670
Scalar APX log 551338
Scalar VAPX log 328840
SIMD STD log 1667942 ( 208492 per scalar, SpeedUp 9.48203 )
SIMD VDT log 1378560 ( 172320 per scalar, SpeedUp 9.01619 )
SIMD APX log 777191 ( 97148 per scalar, SpeedUp 5.67519 )
SIMD VAPX log 592706 ( 74088 per scalar, SpeedUp 4.43849 )
testing sqrt functions...
Scalar STD sqrt 446829
SIMD STD sqrt 330341 ( 41292 per scalar, SpeedUp 10.821 )
testing 1/sqrt functions...
Scalar STD isqrt 1065124
Scalar VDT isqrt 981662
SIMD STD isqrt 859815 ( 107476 per scalar, SpeedUp 9.91026 )
SIMD VDT isqrt 1069435 ( 133679 per scalar, SpeedUp 7.34341 )
testing sincos functions...
Scalar STD sincos 2324305
Scalar VDT sincos 2400355
SIMD STD sincos 3311407 ( 413925 per scalar, SpeedUp 5.61527 )
SIMD VDT sincos 1631723 ( 203965 per scalar, SpeedUp 11.7684 )
testing atan2 functions...
Scalar STD atan2 3101764
Scalar VDT atan2 2541980
SIMD STD atan2 2891748 ( 361468 per scalar, SpeedUp 8.58101 )
SIMD VDT atan2 1976419 ( 247052 per scalar, SpeedUp 10.2892 )
testing asin functions...
Scalar STD asin 1699003
Scalar VDT asin 1241106
SIMD STD asin 1537802 ( 192225 per scalar, SpeedUp 8.8386 )
SIMD VDT asin 983377 ( 122922 per scalar, SpeedUp 10.0967 )
Where STD
refers to either the STL
implementation (for scalars) or the Vc
one (for SIMD types). VDT
is either the upstream library (for scalars) or my implementation of them (for SIMD). APX
and VAPX
refer to the faster, approximate and 'very' approximate versions. In all cases the based scalar type used is float
. The numbers are the timings (smaller better) and for the SIMD
types I show as well the speed ups w.r.t. the equivalent scalar implementation.
The approximate log
and exp
methods are significantly faster than the VDT
or STL
versions, but the differences w.r.t. the STl
standard are larger, so should be used with care. However, in the right use case (a real hotspot where absolute precision is not essential) can be very useful. I use the 'very approximate' log form in the RICH likelihood minimisation and it gives a decent boost...