Review InterpolatedBFieldMap for performance (was ACTS-383)
Original author Hadrien Benjamin Grasland @hgraslan
From an admittedly quick look, it seems to me that the current design of the InterpolatedBFieldMap is optimized for implementation convenience, flexibility and future extensibility, rather than performance. For example, it makes heavy use of type erasure, which introduces vtable indirection and may thus hamper some important compiler optimizations based on the ability to inline function code.
This is problematic because we expect to perform a lot of magnetic field lookups in tracking, which means that they should be very fast. I thus think that we should use carefully review the interpolated magnetic field design and implementation, and push it in a more performance-oriented direction, by removing flexibility which is not needed and moving to compile-time polymorphism over run-time polymorphism whenever practical.
Some ideas of things to do:
- Remove or simplify unneeded sophistication. For example, do we need to have a full blown non-linear transform (std::function) from the coordinate system of the detector to that of the magnetic field? Could it be just an affine transform expressed as an Eigen matrix? Should we optimize even further for the identity transform case?
- Reduce use of type erasure (e.g. std::function, AnyGrid concept) in favor of concrete types and methods that are guaranteed to be resolved at compile time, either expressed as member typedefs/methods (where customizability is not needed) or template parameters (where it is needed).
- Before and after this design simplification, profile the interpolated magnetic field map implementation using the test framework example freshly introduced by <~asalzbur>. Look for unexpected sources of inefficiency and performance regressions brought by the rework.
- For simple enough components (e.g. the FieldCell, which basically performs linear interpolation in an N-dimensional box), we could / should even convince ourselves that we operate close to the theoretical performance limit by computing it from the mathematics and the hardware specs.