Skip to content

RichRecUtils/QuarticSolverNewton.h - More optimisations

Christopher Rob Jones requested to merge RichRecUtils-QuarticSolverNewton into master

Some more fine optimisations of the kernel of the RICH photon reco.

  • Removed some conditionals performing data 'sanity' checks, as in practise, given the nature of the RICH data the class operates on, the check can never fail.
  • Optimised some other conditionals / masked writes to be more efficient for the most common cases.
  • Added some GCC compiler hints to force loop unrolling.
  • various other little things.

All in, speed gains are (AVX2+FMA)

 > RichPhotonRecoTestAVX2FMA.exe
Creating 48000 random photons ... done.
Scalar Float      2077493
SIMD   Float      2604840 per scalar SpeedUp 6.38041

to

 > RichPhotonRecoTestAVX2FMA.exe
Creating 48000 random photons ... done.
Scalar Float      1976141
SIMD   Float      2100104 per scalar SpeedUp 7.52778

So SIMD gains most (as overheads from conditionals are more expensive) and I am getting closer to the 'perfect' factor of 8 in the ratio SIMD(AVX2)/scalar.

@cattanem I am not sure if changes here will result in bit-wise identical results. Probably not. Diffs, if present, should be tiny though as just due to machine precision.

Edited by Marco Cattaneo

Merge request reports