B2OC: speed-up in B2OC D->4body and 3body builders
A modification of 3body and 4body builders in B2OC allowing to reduce overall CPU usage at Hlt2 by >4.3% (selection part by >10.2%), see details below. Inspired by discussion with @mstahl.
I have no personal interest in pushing these changes, but maybe others would find it useful.
According to the hlt2_pp tests, log most cpu-expensive algorithms in selection are from B2OC.
The builders D02KmPimPipPipCombiner_xxx
, D02KpPipPimPimCombiner_xxx
, D02PipPipPimPimCombiner_xxx
, D02KpKmPipPimCombiner_xxx
, Ds2KKPiCombiner_xxx
, Xic02PKKPiCombiner_xxx
, Omegac02PKKPiCombiner_xxx
in sum give 218.4s/3358s = 6.5% of all cpu-usage.
See a comparison of a proposed modifications with default version (master) tested locally at lxplus on 1000 events:
- log for default (master): hlt2_b2oc_speedup_1k.log
- log for "partial" speedup, where cuts on F.M are added to 12&123 combiner cuts, but without splitting of
make_threebody/fourbody
functions: hlt2_b2oc_speedup-v0_1k.log - log of "full" speedup, where in addition to cuts on F.M added to 12&123 combiner cuts
make_threebody/fourbody
functions are split in two stages (combination and filtering) to reduce number of time combinatorics is done: hlt2_b2oc_speedup-master_1k.log
combiner | default (master) | "partial" speed-up | "full" speed-up |
---|---|---|---|
D02KmPimPipPipCombiner_xxx |
2.93+1.56+1.56+1.56+1.55+1.39+0.91=11.46s | 1.14+0.67+0.65+0.65+0.61+0.59+0.37 = 4.68s | 1.14+0.65+0.59+0.59+0.41=3.38s |
D02KpPipPimPimCombiner_xxx |
2.79+1.52+1.51+1.49+1.48+1.34+0.90=11.03s | 1.13+0.68+0.66+0.65+0.60+0.59+0.38 = 4.69s | 1.12+0.65+0.59+0.59+0.37 = 3.32s |
Xic02PKKPiCombiner_xxx |
2.37+0.23 = 2.60s | 0.71+0.08 = 0.79s | 0.71+0.08 = 0.79s |
Omegac02PKKPiCombiner_xxx |
2.34+0.24 = 2.58s | 1.00+0.09 = 1.09s | 0.98+0.10 = 1.08s |
Ds2KKPiCombiner_xxx |
0.83+0.78+0.77 = 2.38s | 0.70+0.65+0.65 = 2.00s | 1.14+0.65 = 1.79s |
sum | 30.05s | 13.25s | 10.36s |
Thus, "full" speedup reduces cpu-usage of corresponding algorithms by 66%. Thus, if recalculating to overall cpu-usage in hlt2 the numbers 4.3% and 10.2% at the top are obtained. As more lines are possibly affected the actual reduction might be even more.
small bonus, as suggested by @mstahl (and with input from @gligorov), naive recalculation into energy saved can be estimated as:
- taking typical data taking year as 10h/day * 165days/year = 1650 hours.
- taking typical power consumption either as
gives 1160-2150 MWh / year. Taking energy price for France as 206 eur/MWh it results in 240-440k eur/year.
Thus reducing consumption by 4.3% gives 10-20k eur of savings a year, to be multiplied by ~3 for the whole Run3.