Backport MR 72001 from main to 24.0
This backports !72001 (merged).
As is one that gave us some visible performance gain (details in that MR if you scrol down there are also SPOT plots)
As mentioned in that one
Note that I keep it
avx2
and avoidavx2+fma
asfma
could affect the numerical stability here