WIP : BFieldCache : Use vector instructions for getB
Can / should we vectorize
I think the idea of vectorizing
parts of this code
has been discussed in the past at least I think
have heard it from @wlampl at some stage.
Note that auto-vectorization (tree-vectorize) might be helpfull but is not by default in gcc O2.
So alternative we could try that ? (via vectorize.h
)
-
What we have today https://godbolt.org/z/8s6Tde:
-
Use auto-vectorization via the attributes in
vectorize.h
https://godbolt.org/z/cdPfjq -
Use
explitic
vectorization. https://godbolt.org/z/bqqevr
Issues if we want to re-arrange the math
This is relevant if we want to change a bit the order for option 3 or 2 above.
The issue is that math multiplication is associative but this is not the
case when doing math operations in a computer .
Although this prb will make it fail the CI
- One can re-arrange this
// interpolate field values in z, r, phi
double Bzrphi[3];
for (int i = 0; i < 3; ++i) { // z, r, phi components
const double* field = m_field[i];
Bzrphi[i] = scale * (gz * (gr * (gphi * field[0] + fphi * field[1]) +
fr * (gphi * field[2] + fphi * field[3])) +
fz * (gr * (gphi * field[4] + fphi * field[5]) +
fr * (gphi * field[6] + fphi * field[7])));
}
To something potentially like (this is a bit SSE2 oriented) :
// interpolate field values in z, r, phi
CxxUtils::vec<double, 2> interpCoeffphi1 = { gphi, fphi };
CxxUtils::vec<double, 2> interpCoeffphi2 = { gphi, fphi };
CxxUtils::vec<double, 2> interpCoeffphi3 = { gphi, fphi };
CxxUtils::vec<double, 2> interpCoeffphi4 = { gphi, fphi };
std::array<double, 3> Bzrphi{};
for (int i = 0; i < 3; ++i) { // z, r, phi components
CxxUtils::vec<double, 2> field1 = { m_field[i][0], m_field[i][1] };
CxxUtils::vec<double, 2> field2 = { m_field[i][2], m_field[i][3] };
CxxUtils::vec<double, 2> field3 = { m_field[i][4], m_field[i][5] };
CxxUtils::vec<double, 2> field4 = { m_field[i][6], m_field[i][7] };
CxxUtils::vec<double, 2> interp1 = field1 * interpCoeffphi1 * gr ;
CxxUtils::vec<double, 2> interp2 = field2 * interpCoeffphi2 * fr ;
CxxUtils::vec<double, 2> interp3 = field3 * interpCoeffphi3 * gr ;
CxxUtils::vec<double, 2> interp4 = field4 * interpCoeffphi4 * fr ;
CxxUtils::vec<double, 2> interp = ( gz*(interp1 + interp2) + fz*(interp3 + interp4));
Bzrphi[i] = scale * (interp[0] + interp[1]);
}
This passes the unit test by @schaffer, but as expected can produce different numerical results . Let me randomly mention @smh @ssnyder @jchapman @amorley as might be interested.
For me this is more of starting kind of a discussion on if we want to give these things a try. Also would be interesting to time this , being inline prb means we need to recompile a few clients.