Backport MR 72001 from main to 24.0
This backports !72001 (merged).
As is one that gave us some visible performance gain (details in that MR if you scrol down there are also SPOT plots)
As mentioned in that one
Note that I keep it
avx2and avoidavx2+fmaasfmacould affect the numerical stability here