Optimisation of some HLT1 algorithms
The commits in this MR are cherry-picked from !742 (merged)
Pipelines:
master: https://mattermost.web.cern.ch/lhcb/pl/qtg9f1ddk38smna988fqmxcqty
NVIDIA GeForce RTX 3090 │███████████████████████████████████████████████ 198.62 kHz (0.92x)
NVIDIA RTX A6000 │██████████████████████████████████████████████ 192.52 kHz (0.92x)
NVIDIA RTX A5000 │█████████████████████████████████████ 156.63 kHz (0.93x)
NVIDIA GeForce RTX 2080 Ti │██████████████████████████████ 126.94 kHz (0.91x)
MI100 │████████████████████████ 102.81 kHz (0.89x)
AMD EPYC 7502 32-Core │████ 16.82 kHz (0.97x)
┼─────┴─────┼─────┴─────┼─────┴─────┼─────┴─────┼
0 50 100 150 200
this branch: https://mattermost.web.cern.ch/lhcb/pl/9ohkanksijfa3jgseowkhaw9se
NVIDIA GeForce RTX 3090 │██████████████████████████████████████████ 214.11 kHz (0.99x)
NVIDIA RTX A6000 │█████████████████████████████████████████ 206.76 kHz (0.99x)
NVIDIA RTX A5000 │█████████████████████████████████ 169.45 kHz (1.00x)
NVIDIA GeForce RTX 2080 Ti │███████████████████████████ 136.83 kHz (0.98x)
MI100 │████████████████████ 100.12 kHz (0.87x)
AMD EPYC 7502 32-Core │███ 17.21 kHz (0.99x)
┼────┴────┼────┴────┼────┴────┼────┴────┼────┴────┼
0 50 100 150 200 250
A5000: +8.2%
2080ti: +7.8%
Private tests (2080ti):
master
125309.630236 events/s +0%
Optimize small algorithms
126428.685902 events/s +0.9%
Take advantage of unused threads in pv_beamline_peak
128121.802201 events/s +2.2%
Fix shape of sv fitter kernel
131485.748047 events/s +4.9%
Slight improvements to is_muon
133981.343451 events/s +6.9%
Remove unnecessary sort in pv finder
136317.593650 events/s +8.7%
Fix velo consolidate tracks block_dim
137101.771522 events/s +9.4%
Edited by Arthur Marius Hennequin