WIP : BFieldCache : Use vector instructions for getB (!35858) · Merge requests · atlas / athena

Christos Anastopoulos requested to merge ATLAS-EGamma/athena:BFieldCache_vec into master Aug 24, 2020

Can / should we vectorize

I think the idea of vectorizing parts of this code has been discussed in the past at least I think have heard it from @wlampl at some stage.

Note that auto-vectorization (tree-vectorize) might be helpfull but is not by default in gcc O2. So alternative we could try that ? (via vectorize.h)

What we have today https://godbolt.org/z/8s6Tde:
Use auto-vectorization via the attributes in vectorize.h https://godbolt.org/z/cdPfjq
Use explitic vectorization. https://godbolt.org/z/bqqevr

Issues if we want to re-arrange the math

This is relevant if we want to change a bit the order for option 3 or 2 above. The issue is that math multiplication is associative but this is not the case when doing math operations in a computer . Although this prb will make it fail the CI

One can re-arrange this

// interpolate field values in z, r, phi
  double Bzrphi[3];
  for (int i = 0; i < 3; ++i) { // z, r, phi components
    const double* field = m_field[i];
    Bzrphi[i] = scale * (gz * (gr * (gphi * field[0] + fphi * field[1]) +
                               fr * (gphi * field[2] + fphi * field[3])) +
                         fz * (gr * (gphi * field[4] + fphi * field[5]) +
                               fr * (gphi * field[6] + fphi * field[7])));
  }

To something potentially like (this is a bit SSE2 oriented) :

// interpolate field values in z, r, phi
  CxxUtils::vec<double, 2> interpCoeffphi1 = { gphi, fphi };
  CxxUtils::vec<double, 2> interpCoeffphi2 = { gphi, fphi };
  CxxUtils::vec<double, 2> interpCoeffphi3 = { gphi, fphi };
  CxxUtils::vec<double, 2> interpCoeffphi4 = { gphi, fphi };

  std::array<double, 3> Bzrphi{};
  for (int i = 0; i < 3; ++i) { // z, r, phi components

    CxxUtils::vec<double, 2> field1 = { m_field[i][0], m_field[i][1] };
    CxxUtils::vec<double, 2> field2 = { m_field[i][2], m_field[i][3] };
    CxxUtils::vec<double, 2> field3 = { m_field[i][4], m_field[i][5] };
    CxxUtils::vec<double, 2> field4 = { m_field[i][6], m_field[i][7] };

    CxxUtils::vec<double, 2> interp1 = field1 * interpCoeffphi1 * gr ;

    CxxUtils::vec<double, 2> interp2 = field2 * interpCoeffphi2 * fr ;

    CxxUtils::vec<double, 2> interp3 = field3 * interpCoeffphi3 * gr ;

    CxxUtils::vec<double, 2> interp4 = field4 * interpCoeffphi4 * fr ;

    CxxUtils::vec<double, 2> interp = ( gz*(interp1 + interp2) + fz*(interp3 + interp4));

    Bzrphi[i] = scale * (interp[0] + interp[1]);
  }

This passes the unit test by @schaffer, but as expected can produce different numerical results . Let me randomly mention @smh @ssnyder @jchapman @amorley as might be interested.

For me this is more of starting kind of a discussion on if we want to give these things a try. Also would be interesting to time this , being inline prb means we need to recompile a few clients.

Edited Aug 25, 2020 by Christos Anastopoulos

WIP : BFieldCache : Use vector instructions for getB

Can / should we vectorize

Issues if we want to re-arrange the math

Merge request reports