RichRecUtils/QuarticSolverNewton.h - More optimisations
Some more fine optimisations of the kernel of the RICH photon reco.
- Removed some conditionals performing data 'sanity' checks, as in practise, given the nature of the RICH data the class operates on, the check can never fail.
- Optimised some other conditionals / masked writes to be more efficient for the most common cases.
- Added some GCC compiler hints to force loop unrolling.
- various other little things.
All in, speed gains are (AVX2+FMA)
> RichPhotonRecoTestAVX2FMA.exe
Creating 48000 random photons ... done.
Scalar Float 2077493
SIMD Float 2604840 per scalar SpeedUp 6.38041
to
> RichPhotonRecoTestAVX2FMA.exe
Creating 48000 random photons ... done.
Scalar Float 1976141
SIMD Float 2100104 per scalar SpeedUp 7.52778
So SIMD gains most (as overheads from conditionals are more expensive) and I am getting closer to the 'perfect' factor of 8 in the ratio SIMD(AVX2)/scalar.
@cattanem I am not sure if changes here will result in bit-wise identical results. Probably not. Diffs, if present, should be tiny though as just due to machine precision.
Edited by Marco Cattaneo