Speedup computation of Pixel distortion correction

In a data22 athena job, std::pow shows up as the the 6th most used function in vtune (50 s accumulated of 1751 s total Algorithm::sysExecute). This function is primarily called by PixelDistortionData::bernstein_grundpolynom which computes

sum_i=0^n sum_j=0^n P_{n;i}(u) P_{n;j}(v) a_{ij}

with P_{n;i}(u) = (n over i) * u^i * (1-u)^{n-i}, and n=20. In this nested sum std::pow is called for each evaluation 2*20^2 times.

To speed up the computation, the commits implement the following optimisations:

  1. compute u^{i}, (1-u)^i, v^{i} and (1-v)^{i} outside the nested sum for i=0..20, use the pre-computed factors inside the nested sum. This replaces 400 calls of std::pow by array accesses and 4*20 products.
  2. multiply the parameters a_{ij} when creating the conditions data with the binomial coefficients (n_over_i) * (n_over_j), which removes two multiplications from the nested sum.
  3. pass the distortion correction parameter vectors by reference rather than by value which avoid allocation and memory copies.

In synthetic benchmarks this shows a speedup by more than a factor 50. The athena data22 jobs show in average an increase in the number of events processed per second by more than ~1% (an improvement from 1.11 to 1.12 events/s), which is below the naive expectation of ~3% (taking the vtune results as truth: 1751s -> 1706s).

Merge request reports