Speedup computation of Pixel distortion correction
In a data22 athena job, std::pow shows up as the the 6th most used function in vtune (50 s accumulated of 1751 s total Algorithm::sysExecute). This function is primarily called by PixelDistortionData::bernstein_grundpolynom which computes
sum_i=0^n sum_j=0^n P_{n;i}(u) P_{n;j}(v) a_{ij}
with P_{n;i}(u) = (n over i) * u^i * (1-u)^{n-i}, and n=20. In this nested sum std::pow is called for each evaluation 2*20^2 times.
To speed up the computation, the commits implement the following optimisations:
- compute u^{i}, (1-u)^i, v^{i} and (1-v)^{i} outside the nested sum for i=0..20, use the pre-computed factors inside the nested sum. This replaces 400 calls of std::pow by array accesses and 4*20 products.
- multiply the parameters a_{ij} when creating the conditions data with the binomial coefficients (n_over_i) * (n_over_j), which removes two multiplications from the nested sum.
- pass the distortion correction parameter vectors by reference rather than by value which avoid allocation and memory copies.
In synthetic benchmarks this shows a speedup by more than a factor 50. The athena data22 jobs show in average an increase in the number of events processed per second by more than ~1% (an improvement from 1.11 to 1.12 events/s), which is below the naive expectation of ~3% (taking the vtune results as truth: 1751s -> 1706s).