Builds on LHCb!933 (merged) and Lbcom!190 (merged)
Firstly I have, as in Lbcom, retired the runtime CPU capabilities detection and dispatching code. Instead I simply rely on the build platform settings to determine the SIMD level. This is necessary to allow the use of SIMD (Vc
) types more generally, rather than in very localised places.
Second, I have added new development algorithms to start fully vectorising the RICH reconstruction. So far these are

A pixel algorithm that takes the scalar information and summarises it into SIMD TES data objects. Longer term this algorithm might well be retired itself, as once everything is based on SIMD types the first step making the scalar versions can be removed. For the moment though its useful to have both scalar and SIMD versions.

An (almost) fully SIMD vectorised (using
Vc
andGenVector
types) quartic photon reconstruction algorithm is provided. 
An (almost) fully SIMD vectorised version of the photon pixel probability (the next in line after the quartic algorithm).
I say almost in 2. and 3. as there are a few places where I have had to resort back to scalar loops over the SIMD types to perform some calculations, where I have yet to see any obvious way to do it fully vectorised. Generally this happens for instance when I have to follow a pointer to say a mirror segment, for each 'scalar' photon.
The results so far are looking quite good. For an SSE4.2 build (the default) I get
RichPhotonRecoLong  27.060  27.468  0.311 218.9 25.96  1000  27.469 
RichPredPixelSignalLong  2.750  2.686  0.036 20.0 2.38  1000  2.687 
RichSIMDPhotonRecoLong  14.220  14.168  0.194 110.1 12.99  1000  14.169 
RichSIMDPredPixelSignalLong  1.690  1.621  0.027 11.9 1.41  1000  1.622 
where the first two are the scalar versions and the last two the SIMD (SSE4.2).
If I instead build my stack allowing AVX2+FMA
I get
RichPhotonRecoLong  23.100  22.873  0.263 184.1 21.60  1000  22.874 
RichPredPixelSignalLong  2.310  2.301  0.032 15.6 2.03  1000  2.302 
RichSIMDPhotonRecoLong  8.160  8.508  0.140 63.5 7.60  1000  8.508 
RichSIMDPredPixelSignalLong  1.370  1.473  0.027 10.6 1.22  1000  1.474 
So the factor 2 increase in the SIMD vector size (4 to 8 floats) is seen. Its not quite perfect, but I have ideas as to why this is... ( Note the scalar version for AVX2+FMA is already faster, as it is able to gain from the FMA part...).