Improve VELO efficiency and throughput
The following has been done:
- FillCandidates has been removed.
- Module pairs are now populated in the same container in memory, a single phi is calculated for each, and they are sorted by phi.
- The phi calculation returns an
int16_t
, where the full range of the atan2 function is mapped to the full range of theint16_t
. This has several benefits, notably that it wraps around. Comparing +2^15 to -2^15 is very close (as it should be when comparing phi), due to modulo arithmetic. Also,int16_t
occupies half the memory offloat
:) - For the seeding stage, first a h0 hit is sought. A technique similar to "closest by phi" is employed, but rather the closest in memory is sought, using a binary search followed by a "pendulum search" (ie. i, i-1, i+1, i-2, i+2, ...). In this manner, there is no need to test found candidates, and the first n candidates found (ie. not marked as used) are kept. Note that this method does not guarantee to find the n-nearest candidates in phi, but rather the n-nearest candidates in memory to the i position. By default n = 5, although a more detailed throughput study would be necessary to determine a good balance of candidates / efficiency / throughput.
- For the seeding stage, the doublet is extended to the third module. One single binary search returns the first candidate to be considered, and all successive candidates are sought iteratively, until one has a phi difference too large. Since the phi wraps around, the list of hits is actually a round buffer: ie. after element n-1, element 0 follows.
- The tracks are extended in a similar way. One single binary search for the first candidate, and each successive element sought iteratively.
- A parameter that controls this has been introduced:
phi_tolerance
. -
max_scatter
andphi_tolerance
have been updated to best values of a parameter scan:0.045
and0.08
. -
block_dim_x
default has been changed to64
threads. - Documented better Search by triplet.
- Throughput change (VELO reconstruction, Quadro RTX 6000, from master):
465.591 kHz
->509.179 kHz
(+10%). - Physics efficiency comparison against baseline.
- Physics efficiency changes in CI:
< TrackChecker output : 1331/ 231150 0.58% ghosts
< 01_velo : 99005/ 102077 96.99% ( 96.98%), 1593 ( 1.58%) clones, pur 99.70%, hit eff 96.95%
< 02_long : 57260/ 57988 98.74% ( 98.77%), 670 ( 1.16%) clones, pur 99.77%, hit eff 97.84%
< 03_long_P>5GeV : 36603/ 36811 99.43% ( 99.46%), 354 ( 0.96%) clones, pur 99.77%, hit eff 98.25%
< 04_long_strange : 2418/ 2542 95.12% ( 95.58%), 25 ( 1.02%) clones, pur 99.22%, hit eff 97.70%
< 05_long_strange_P>5GeV : 1172/ 1197 97.91% ( 97.91%), 10 ( 0.85%) clones, pur 99.12%, hit eff 98.13%
< 06_long_fromB : 3857/ 3930 98.14% ( 98.57%), 34 ( 0.87%) clones, pur 99.67%, hit eff 98.06%
< 07_long_fromB_P>5GeV : 3189/ 3219 99.07% ( 99.20%), 25 ( 0.78%) clones, pur 99.73%, hit eff 98.27%
< 08_long_electrons : 4406/ 4548 96.88% ( 97.09%), 111 ( 2.46%) clones, pur 98.01%, hit eff 97.00%
< 09_long_fromB_electrons : 193/ 202 95.54% ( 96.01%), 6 ( 3.02%) clones, pur 97.70%, hit eff 95.74%
< 10_long_fromB_electrons_P>5GeV : 127/ 130 97.69% ( 98.18%), 4 ( 3.05%) clones, pur 98.34%, hit eff 97.01%
---
> TrackChecker output : 2217/ 239297 0.93% ghosts
> 01_velo : 100495/ 102077 98.45% ( 98.47%), 2175 ( 2.12%) clones, pur 99.70%, hit eff 96.59%
> 02_long : 57620/ 57988 99.37% ( 99.42%), 825 ( 1.41%) clones, pur 99.83%, hit eff 97.84%
> 03_long_P>5GeV : 36672/ 36811 99.62% ( 99.67%), 418 ( 1.13%) clones, pur 99.84%, hit eff 98.39%
> 04_long_strange : 2493/ 2542 98.07% ( 98.46%), 36 ( 1.42%) clones, pur 99.40%, hit eff 97.22%
> 05_long_strange_P>5GeV : 1184/ 1197 98.91% ( 98.83%), 10 ( 0.84%) clones, pur 99.36%, hit eff 98.12%
> 06_long_fromB : 3897/ 3930 99.16% ( 99.46%), 43 ( 1.09%) clones, pur 99.77%, hit eff 98.12%
> 07_long_fromB_P>5GeV : 3201/ 3219 99.44% ( 99.58%), 30 ( 0.93%) clones, pur 99.79%, hit eff 98.49%
> 08_long_electrons : 4431/ 4548 97.43% ( 97.55%), 124 ( 2.72%) clones, pur 98.16%, hit eff 96.87%
> 09_long_fromB_electrons : 195/ 202 96.53% ( 97.32%), 11 ( 5.34%) clones, pur 97.51%, hit eff 95.35%
> 10_long_fromB_electrons_P>5GeV : 127/ 130 97.69% ( 98.18%), 8 ( 5.93%) clones, pur 98.10%, hit eff 96.81%
Edited by Daniel Hugo Campora Perez