Rich likelihood calculation, move SIMD horizontal sum outside loop
Gives O(10%) reduction in valgrind instruction count (AVX2+FMA) in overall likelihood minimisation algorithm.
Note machine precision level diffs might be seen.
Gives O(10%) reduction in valgrind instruction count (AVX2+FMA) in overall likelihood minimisation algorithm.
Note machine precision level diffs might be seen.