BFieldCache , try to vectorize parts of the getB method (!35966) · Merge requests · atlas / athena

This is a proper try of !35858 (closed)

In short the idea is to try and use vector instructions inside on of the BFieldCache::getB loops. When we are calling the full field from the Runge Kutta propagator e.g what GSF does for example this loop is > 50 % of the getB calls (and there quite a lot if you do multi-component propagations).
One of the issues is binary compatibility. Getting back exactly the same result and also trying to vectorize , which might need re-ordering of the "operations"
In this attempt I try to retain the same result, which means keeping the order of the operations the same (albeit in 4 double wide lanes). I modify the layout of the data to get as much vector operation I can. But this also though means that there is a tradeoff on the last statement of the for . A relevant comment is added.
In the other MR there links to bench marking code , objdump etc . Will not repeat it here. in Benchmarking seems faster , but prb we will need to time things also in "production".

Edited Aug 28, 2020 by Christos Anastopoulos

BFieldCache , try to vectorize parts of the getB method