Improve VELO efficiency and throughput (!365) · Merge requests · LHCb / Allen

The following has been done:

FillCandidates has been removed.
Module pairs are now populated in the same container in memory, a single phi is calculated for each, and they are sorted by phi.
The phi calculation returns an int16_t, where the full range of the atan2 function is mapped to the full range of the int16_t. This has several benefits, notably that it wraps around. Comparing +2^15 to -2^15 is very close (as it should be when comparing phi), due to modulo arithmetic. Also, int16_t occupies half the memory of float :)
For the seeding stage, first a h0 hit is sought. A technique similar to "closest by phi" is employed, but rather the closest in memory is sought, using a binary search followed by a "pendulum search" (ie. i, i-1, i+1, i-2, i+2, ...). In this manner, there is no need to test found candidates, and the first n candidates found (ie. not marked as used) are kept. Note that this method does not guarantee to find the n-nearest candidates in phi, but rather the n-nearest candidates in memory to the i position. By default n = 5, although a more detailed throughput study would be necessary to determine a good balance of candidates / efficiency / throughput.
For the seeding stage, the doublet is extended to the third module. One single binary search returns the first candidate to be considered, and all successive candidates are sought iteratively, until one has a phi difference too large. Since the phi wraps around, the list of hits is actually a round buffer: ie. after element n-1, element 0 follows.
The tracks are extended in a similar way. One single binary search for the first candidate, and each successive element sought iteratively.
A parameter that controls this has been introduced: phi_tolerance.
max_scatter and phi_tolerance have been updated to best values of a parameter scan: 0.045 and 0.08.
block_dim_x default has been changed to 64 threads.
Documented better Search by triplet.
Throughput change (VELO reconstruction, Quadro RTX 6000, from master): 465.591 kHz -> 509.179 kHz (+10%).
Physics efficiency comparison against baseline.
Physics efficiency changes in CI:

 < TrackChecker output                               :      1331/   231150   0.58% ghosts
< 01_velo                                           :     99005/   102077  96.99% ( 96.98%),      1593 (  1.58%) clones, pur  99.70%, hit eff  96.95%
< 02_long                                           :     57260/    57988  98.74% ( 98.77%),       670 (  1.16%) clones, pur  99.77%, hit eff  97.84%
< 03_long_P>5GeV                                    :     36603/    36811  99.43% ( 99.46%),       354 (  0.96%) clones, pur  99.77%, hit eff  98.25%
< 04_long_strange                                   :      2418/     2542  95.12% ( 95.58%),        25 (  1.02%) clones, pur  99.22%, hit eff  97.70%
< 05_long_strange_P>5GeV                            :      1172/     1197  97.91% ( 97.91%),        10 (  0.85%) clones, pur  99.12%, hit eff  98.13%
< 06_long_fromB                                     :      3857/     3930  98.14% ( 98.57%),        34 (  0.87%) clones, pur  99.67%, hit eff  98.06%
< 07_long_fromB_P>5GeV                              :      3189/     3219  99.07% ( 99.20%),        25 (  0.78%) clones, pur  99.73%, hit eff  98.27%
< 08_long_electrons                                 :      4406/     4548  96.88% ( 97.09%),       111 (  2.46%) clones, pur  98.01%, hit eff  97.00%
< 09_long_fromB_electrons                           :       193/      202  95.54% ( 96.01%),         6 (  3.02%) clones, pur  97.70%, hit eff  95.74%
< 10_long_fromB_electrons_P>5GeV                    :       127/      130  97.69% ( 98.18%),         4 (  3.05%) clones, pur  98.34%, hit eff  97.01%
---
> TrackChecker output                               :      2217/   239297   0.93% ghosts
> 01_velo                                           :    100495/   102077  98.45% ( 98.47%),      2175 (  2.12%) clones, pur  99.70%, hit eff  96.59%
> 02_long                                           :     57620/    57988  99.37% ( 99.42%),       825 (  1.41%) clones, pur  99.83%, hit eff  97.84%
> 03_long_P>5GeV                                    :     36672/    36811  99.62% ( 99.67%),       418 (  1.13%) clones, pur  99.84%, hit eff  98.39%
> 04_long_strange                                   :      2493/     2542  98.07% ( 98.46%),        36 (  1.42%) clones, pur  99.40%, hit eff  97.22%
> 05_long_strange_P>5GeV                            :      1184/     1197  98.91% ( 98.83%),        10 (  0.84%) clones, pur  99.36%, hit eff  98.12%
> 06_long_fromB                                     :      3897/     3930  99.16% ( 99.46%),        43 (  1.09%) clones, pur  99.77%, hit eff  98.12%
> 07_long_fromB_P>5GeV                              :      3201/     3219  99.44% ( 99.58%),        30 (  0.93%) clones, pur  99.79%, hit eff  98.49%
> 08_long_electrons                                 :      4431/     4548  97.43% ( 97.55%),       124 (  2.72%) clones, pur  98.16%, hit eff  96.87%
> 09_long_fromB_electrons                           :       195/      202  96.53% ( 97.32%),        11 (  5.34%) clones, pur  97.51%, hit eff  95.35%
> 10_long_fromB_electrons_P>5GeV                    :       127/      130  97.69% ( 98.18%),         8 (  5.93%) clones, pur  98.10%, hit eff  96.81%

Edited Apr 29, 2020 by Daniel Hugo Campora Perez

Improve VELO efficiency and throughput

Merge request reports