Vectorized sincos implementation, already discussed in ATLASSIM-4753 and presented at a sim meeting: https://indico.cern.ch/event/954598/contributions/4019246/attachments/2111256/3551451/miham_2020_09_29.Sim.pdf.
Both stand-alone and Athena timing tests show a ~20% speedup of the sincos calculation in LArWheelCalculator.
There are a couple of caveats though:
- pre-vectorized parameterization needs to be saved in LArWheelCalculator.h, which does not compile with CLING -->
#if !defined(__CLING__)lines are added because of this
- this implementation was found to perform a bit worse than the previous sincos implementation on KNL platforms, likely due to extra instructions caused by the FMV