Skip to content

BFieldCache , try to vectorize parts of the getB method

This is a proper try of !35858 (closed)

So let me mention @schaffer , @mbandier , @jchapman @amorley , @smh

  • In short the idea is to try and use vector instructions inside on of the BFieldCache::getB loops. When we are calling the full field from the Runge Kutta propagator e.g what GSF does for example this loop is > 50 % of the getB calls (and there quite a lot if you do multi-component propagations).

  • One of the issues is binary compatibility. Getting back exactly the same result and also trying to vectorize , which might need re-ordering of the "operations"

  • In this attempt I try to retain the same result, which means keeping the order of the operations the same (albeit in 4 double wide lanes). I modify the layout of the data to get as much vector operation I can. But this also though means that there is a tradeoff on the last statement of the for . A relevant comment is added.

  • In the other MR there links to bench marking code , objdump etc . Will not repeat it here. in Benchmarking seems faster , but prb we will need to time things also in "production".

Edited by Christos Anastopoulos

Merge request reports