BFieldCache , try to vectorize parts of the getB method
This is a proper try of !35858 (closed)
So let me mention @schaffer , @mbandier , @jchapman @amorley , @smh
-
In short the idea is to try and use vector instructions inside on of the BFieldCache::getB loops. When we are calling the full field from the Runge Kutta propagator e.g what GSF does for example this loop is > 50 % of the getB calls (and there quite a lot if you do multi-component propagations).
-
One of the issues is
binary
compatibility. Getting back exactly the same result and also trying to vectorize , which might need re-ordering of the "operations" -
In this attempt I try to retain the same result, which means keeping the order of the operations the same (albeit in 4 double wide lanes). I modify the layout of the data to get as much vector operation I can. But this also though means that there is a tradeoff on the last statement of the
for
. A relevant comment is added. -
In the other MR there links to bench marking code ,
objdump
etc . Will not repeat it here. in Benchmarking seems faster , but prb we will need to time things also in "production".