Draft: Clustering for UT TELL40 decoding
Towards Rec#198. Throughput test added in !561 (merged).
Related: Moore!928 (closed).
This implements UT clustering in software. It will be the standard for v5
decoding without switch to single strip hits.
To avoid copying data, the UT raw banks are overwritten in place.
The cluster residuals in
x_residual_per_cluster_size.pdf
Performance Tests
Performances[1] are better across the board:
current master
UpstreamTrackChecker INFO Results
UpstreamTrackChecker INFO **** Upstream 33938 tracks including 1952 ghosts [ 5.75 %], Event average 5.27 % ****
UpstreamTrackChecker INFO 01_velo : 30006 from 65861 [ 45.56 %] 474 clones [ 1.56 %], purity: 99.52 %, hitEff: 87.34 %
UpstreamTrackChecker INFO 02_velo+UT : 29945 from 57215 [ 52.34 %] 474 clones [ 1.56 %], purity: 99.54 %, hitEff: 87.33 %
UpstreamTrackChecker INFO 03_velo+UT_P>5GeV : 21376 from 29341 [ 72.85 %] 350 clones [ 1.61 %], purity: 99.63 %, hitEff: 89.06 %
UpstreamTrackChecker INFO 04_velo+notLong : 6121 from 28074 [ 21.80 %] 84 clones [ 1.35 %], purity: 99.27 %, hitEff: 85.15 %
UpstreamTrackChecker INFO 05_velo+UT+notLong : 6070 from 19887 [ 30.52 %] 84 clones [ 1.36 %], purity: 99.35 %, hitEff: 85.11 %
UpstreamTrackChecker INFO 06_velo+UT+notLong_P>5GeV : 3297 from 5322 [ 61.95 %] 58 clones [ 1.73 %], purity: 99.54 %, hitEff: 89.73 %
UpstreamTrackChecker INFO 07_long : 23885 from 37787 [ 63.21 %] 390 clones [ 1.61 %], purity: 99.59 %, hitEff: 87.89 %
UpstreamTrackChecker INFO 07_long_strange : 1292 from 2576 [ 50.16 %] 22 clones [ 1.67 %], purity: 99.26 %, hitEff: 88.01 %
UpstreamTrackChecker INFO 08_long_P>5GeV : 18089 from 24385 [ 74.18 %] 292 clones [ 1.59 %], purity: 99.64 %, hitEff: 88.95 %
UpstreamTrackChecker INFO 08_long_strange_P>5GeV : 950 from 1417 [ 67.04 %] 14 clones [ 1.45 %], purity: 99.30 %, hitEff: 89.54 %
UpstreamTrackChecker INFO 09_long_fromB : 267 from 377 [ 70.82 %] 2 clones [ 0.74 %], purity: 99.51 %, hitEff: 87.36 %
UpstreamTrackChecker INFO 09_long_fromD : 1813 from 2565 [ 70.68 %] 26 clones [ 1.41 %], purity: 99.59 %, hitEff: 88.37 %
UpstreamTrackChecker INFO 10_long_fromB_P>5GeV : 224 from 271 [ 82.66 %] 1 clones [ 0.44 %], purity: 99.58 %, hitEff: 87.71 %
UpstreamTrackChecker INFO 10_long_fromD_P>5GeV : 1559 from 1882 [ 82.84 %] 20 clones [ 1.27 %], purity: 99.66 %, hitEff: 88.98 %
UpstreamTrackChecker INFO 11_long_electrons : 547 from 2842 [ 19.25 %] 19 clones [ 3.36 %], purity: 98.20 %, hitEff: 86.24 %
UpstreamTrackChecker INFO 12_long_fromB_electrons : 13 from 39 [ 33.33 %] 1 clones [ 7.14 %], purity: 97.71 %, hitEff: 86.55 %
UpstreamTrackChecker INFO 13_long_fromB_electrons_P>5GeV : 11 from 24 [ 45.83 %] 1 clones [ 8.33 %], purity: 97.33 %, hitEff: 84.31 %
UpstreamTrackChecker INFO 14_long_fromB_P>3GeV_Pt>0.5GeV : 205 from 221 [ 92.76 %] 0 clones [ 0.00 %], purity: 99.57 %, hitEff: 86.64 %
UpstreamTrackChecker INFO 14_long_fromB_electrons_P>3GeV_Pt>0.5GeV : 10 from 15 [ 66.67 %] 1 clones [ 9.09 %], purity: 97.74 %, hitEff: 88.33 %
UpstreamTrackChecker INFO 14_long_fromD_P>3GeV_Pt>0.5GeV : 1443 from 1538 [ 93.82 %] 18 clones [ 1.23 %], purity: 99.71 %, hitEff: 88.71 %
UpstreamTrackChecker INFO 14_long_strange_P>3GeV_Pt>0.5GeV : 744 from 810 [ 91.85 %] 13 clones [ 1.72 %], purity: 99.45 %, hitEff: 89.05 %
UpstreamTrackChecker INFO 15_UT_long_fromB_P>3GeV_Pt>0.5GeV : 205 from 221 [ 92.76 %] 0 clones [ 0.00 %], purity: 99.57 %, hitEff: 86.64 %
this MR
UpstreamTrackChecker INFO Results
UpstreamTrackChecker INFO **** Upstream 34229 tracks including 1892 ghosts [ 5.53 %], Event average 5.07 % ****
UpstreamTrackChecker INFO 01_velo : 30354 from 65861 [ 46.09 %] 481 clones [ 1.56 %], purity: 99.46 %, hitEff: 87.13 %
UpstreamTrackChecker INFO 02_velo+UT : 30288 from 57215 [ 52.94 %] 481 clones [ 1.56 %], purity: 99.48 %, hitEff: 87.13 %
UpstreamTrackChecker INFO 03_velo+UT_P>5GeV : 21732 from 29341 [ 74.07 %] 356 clones [ 1.61 %], purity: 99.53 %, hitEff: 88.71 %
UpstreamTrackChecker INFO 04_velo+notLong : 6135 from 28074 [ 21.85 %] 88 clones [ 1.41 %], purity: 99.24 %, hitEff: 85.04 %
UpstreamTrackChecker INFO 05_velo+UT+notLong : 6078 from 19887 [ 30.56 %] 88 clones [ 1.43 %], purity: 99.33 %, hitEff: 85.01 %
UpstreamTrackChecker INFO 06_velo+UT+notLong_P>5GeV : 3337 from 5322 [ 62.70 %] 61 clones [ 1.80 %], purity: 99.45 %, hitEff: 89.29 %
UpstreamTrackChecker INFO 07_long : 24219 from 37787 [ 64.09 %] 393 clones [ 1.60 %], purity: 99.52 %, hitEff: 87.66 %
UpstreamTrackChecker INFO 07_long_strange : 1312 from 2576 [ 50.93 %] 21 clones [ 1.58 %], purity: 99.13 %, hitEff: 87.73 %
UpstreamTrackChecker INFO 08_long_P>5GeV : 18404 from 24385 [ 75.47 %] 295 clones [ 1.58 %], purity: 99.54 %, hitEff: 88.60 %
UpstreamTrackChecker INFO 08_long_strange_P>5GeV : 969 from 1417 [ 68.38 %] 13 clones [ 1.32 %], purity: 99.12 %, hitEff: 89.04 %
UpstreamTrackChecker INFO 09_long_fromB : 273 from 377 [ 72.41 %] 2 clones [ 0.73 %], purity: 99.42 %, hitEff: 87.15 %
UpstreamTrackChecker INFO 09_long_fromD : 1850 from 2565 [ 72.12 %] 26 clones [ 1.39 %], purity: 99.43 %, hitEff: 87.80 %
UpstreamTrackChecker INFO 10_long_fromB_P>5GeV : 229 from 271 [ 84.50 %] 1 clones [ 0.43 %], purity: 99.49 %, hitEff: 87.71 %
UpstreamTrackChecker INFO 10_long_fromD_P>5GeV : 1596 from 1882 [ 84.80 %] 20 clones [ 1.24 %], purity: 99.44 %, hitEff: 88.24 %
UpstreamTrackChecker INFO 11_long_electrons : 557 from 2842 [ 19.60 %] 19 clones [ 3.30 %], purity: 98.25 %, hitEff: 86.06 %
UpstreamTrackChecker INFO 12_long_fromB_electrons : 13 from 39 [ 33.33 %] 1 clones [ 7.14 %], purity: 97.82 %, hitEff: 90.12 %
UpstreamTrackChecker INFO 13_long_fromB_electrons_P>5GeV : 11 from 24 [ 45.83 %] 1 clones [ 8.33 %], purity: 97.46 %, hitEff: 88.47 %
UpstreamTrackChecker INFO 14_long_fromB_P>3GeV_Pt>0.5GeV : 211 from 221 [ 95.48 %] 0 clones [ 0.00 %], purity: 99.54 %, hitEff: 86.82 %
UpstreamTrackChecker INFO 14_long_fromB_electrons_P>3GeV_Pt>0.5GeV : 10 from 15 [ 66.67 %] 1 clones [ 9.09 %], purity: 97.88 %, hitEff: 92.88 %
UpstreamTrackChecker INFO 14_long_fromD_P>3GeV_Pt>0.5GeV : 1474 from 1538 [ 95.84 %] 18 clones [ 1.21 %], purity: 99.52 %, hitEff: 88.06 %
UpstreamTrackChecker INFO 14_long_strange_P>3GeV_Pt>0.5GeV : 755 from 810 [ 93.21 %] 13 clones [ 1.69 %], purity: 99.25 %, hitEff: 88.59 %
UpstreamTrackChecker INFO 15_UT_long_fromB_P>3GeV_Pt>0.5GeV : 211 from 221 [ 95.48 %] 0 clones [ 0.00 %], purity: 99.54 %, hitEff: 86.82 %
Track resolution checks have been carried out as well. There are no significant changes:
current master
TrackResChecker INFO ************************************
TrackResChecker INFO ALL/x pull : mean = 0.008 +/- 0.007, RMS = 0.981 +/- 0.007
TrackResChecker INFO ALL/y pull : mean = 0.006 +/- 0.007, RMS = 0.973 +/- 0.006
TrackResChecker INFO ALL/tx pull : mean = -0.010 +/- 0.007, RMS = 1.023 +/- 0.007
TrackResChecker INFO ALL/ty pull : mean = -0.006 +/- 0.007, RMS = 0.977 +/- 0.007
TrackResChecker INFO ALL/p pull : mean = 0.006 +/- 0.001, RMS = 0.147 +/- 0.007
TrackResChecker INFO ALL/probChi2 : mean = 0.537 +/- 0.002, RMS = 0.320 +/- 0.001
TrackResChecker INFO ALL/x resolution / mm: RMS = 43.809 +/- 0.551 micron
TrackResChecker INFO ALL/y resolution / mm: RMS = 44.321 +/- 0.588 micron
TrackResChecker INFO ALL/dp/p: mean = -0.0009 +/- 0.0001, RMS = 0.0083 +/- 0.0001
this MR
TrackResChecker INFO ************************************
TrackResChecker INFO ALL/x pull : mean = 0.007 +/- 0.007, RMS = 0.982 +/- 0.006
TrackResChecker INFO ALL/y pull : mean = 0.006 +/- 0.007, RMS = 0.976 +/- 0.006
TrackResChecker INFO ALL/tx pull : mean = -0.009 +/- 0.007, RMS = 1.024 +/- 0.007
TrackResChecker INFO ALL/ty pull : mean = -0.005 +/- 0.007, RMS = 0.980 +/- 0.007
TrackResChecker INFO ALL/p pull : mean = 0.006 +/- 0.001, RMS = 0.148 +/- 0.007
TrackResChecker INFO ALL/probChi2 : mean = 0.537 +/- 0.002, RMS = 0.320 +/- 0.001
TrackResChecker INFO ALL/x resolution / mm: RMS = 43.773 +/- 0.545 micron
TrackResChecker INFO ALL/y resolution / mm: RMS = 44.602 +/- 0.586 micron
TrackResChecker INFO ALL/dp/p: mean = -0.0009 +/- 0.0001, RMS = 0.0083 +/- 0.0001
Throuput tests
master https://mattermost.web.cern.ch/lhcb/pl/qfcgykko47fz5fr4bt3rawrpge
Throughput of branch master (98884cfe), sequence hlt1_pp_scifi_v6 over dataset upgrade-magdown-sim10-up08-30000000-digi_01:
NVIDIA A10 │██████████████████████████████████████████████ 155.09 kHz (1.00x)
GeForce RTX 3080 │███████████████████████████████████████████ 145.58 kHz (1.00x)
GeForce RTX 3090 │███████████████████████████████████████████ 145.52 kHz (1.00x)
A40 │███████████████████████████████████████████ 144.50 kHz (0.98x)
RTX A6000 │██████████████████████████████████████████ 140.33 kHz (0.98x)
Quadro RTX 6000 │███████████████████████████████████████ 132.44 kHz (1.00x)
MI100 │███████████████████████████████████████ 131.09 kHz (0.99x)
GeForce RTX 2080 Ti │██████████████████████████████████████ 128.51 kHz (0.99x)
Tesla V100-PCIE-32GB │████████████████████████████████████ 122.64 kHz (1.00x)
AMD EPYC 7502 32-Core │██████ 20.81 kHz (0.99x)
Intel Xeon E5-2630 v4 │█ 4.37 kHz (1.01x)
┼──┴──┼──┴──┼──┴──┼──┴──┼──┴──┼──┴──┼──┴──┼──┴──┼
0 20 40 60 80 100 120 140 160
this MR https://mattermost.web.cern.ch/lhcb/pl/63e1nqwd9bb99kymumyx3domjo
Throughput of branch mstahl_ut_cluster (bea5d981), sequence hlt1_pp_scifi_v6 over dataset upgrade-magdown-sim10-up08-30000000-digi_01:
NVIDIA A10 │████████████████████████████████ 160.45 kHz (1.03x)
A40 │█████████████████████████████ 149.56 kHz (1.03x)
GeForce RTX 3090 │████████████████████████████ 143.62 kHz (0.99x)
RTX A6000 │███████████████████████████ 138.38 kHz (0.99x)
GeForce RTX 3080 │███████████████████████████ 138.07 kHz (0.95x)
MI100 │███████████████████████████ 137.81 kHz (1.05x)
Quadro RTX 6000 │██████████████████████████ 133.70 kHz (1.01x)
GeForce RTX 2080 Ti │██████████████████████████ 130.20 kHz (1.01x)
Tesla V100-PCIE-32GB │████████████████████████ 122.48 kHz (1.00x)
AMD EPYC 7502 32-Core │████ 21.04 kHz (1.01x)
Intel Xeon E5-2630 v4 │▌ 4.46 kHz (1.02x)
┼─┴─┼─┴─┼─┴─┼─┴─┼─┴─┼─┴─┼─┴─┼─┴─┼─┴─┼
0 20 40 60 80 100 120 140 160 180
[1] The tests have been carried out with all events (1046; only 100 for the plot) /eos/lhcb/grid/prod/lhcb/MC/Upgrade/XDIGI/00129676/0000/00129676_00000010_1.xdigi
(Event type 25103102, Lc -> Lambda pi) from which 619 (66) events passed the GEC. Note that the GEC still takes single strip hits, not clusters.
Merge request reports
Activity
added UT label
mentioned in merge request Moore!928 (closed)
mentioned in issue Rec#198
added 116 commits
-
92f83793...453e28e4 - 115 commits from branch
master
- ef7b485a - [UT decoding] clustering for v5 (TELL40) decoding
-
92f83793...453e28e4 - 115 commits from branch
added 1 commit
- 2373908b - [UT decoding] clustering for v5 (TELL40) decoding
added 1 commit
- bea5d981 - [UT decoding] clustering for v5 (TELL40) decoding
assigned to @msaur
mentioned in issue Moore#324 (closed)
mentioned in issue Moore#335 (closed)
added 1 commit
mentioned in issue Moore#344 (closed)
assigned to @jedavies
unassigned @nkleijne
- Resolved by Dorothea Vom Bruch
- Resolved by Dorothea Vom Bruch
FYI: Rebasing on master should fix the Allen CI pipeline. Now that !685 (merged) is merged, the full functionality of the CI, including the dataset required for this MR, is back in place.
- Resolved by Dorothea Vom Bruch
What does cluster size >2 mean? If these are really clusters with 3 and more strips, should there not be a peak around 0?
For size 1 clusters, I believe we see the strip size and twice the strip size as steps, right?
removed review request for @dovombru
added 147 commits
-
801a975e...453f67ac - 146 commits from branch
master
- ac7ae771 - [UT decoding] clustering for v5 (TELL40) decoding
-
801a975e...453f67ac - 146 commits from branch
mentioned in issue Moore#347 (closed)
- Resolved by Marian Stahl
It looks to me as if this cannot work any longer as intended.
The clustering was relying on updating/re-writing raw banks. I'm not sure how this has worked in the first place with the raw data beingconst
I need to think a bit more about this. My first reflex is to add a new member to
UTRawBank
that contains the clustering information that I tried to write into the few bits reserved for the ADC value.
Let me know if this makes sense.Adding a member variable sounded easier than it was. It would have resulted in a 256 times 6 matrix with the clustering information that would have had to be passed between functions, or re-generated whenever
UTRaw
is called (i.e. the clustering would have run at least 3 times).The solution now casts away constness. I'm not sure if this is the way to go in the end.
Hi @raaij, can you take a look at this? If it sounds good to you I can test it.
In any case the way the Allen persistence is set up at the moment, the raw banks obtained from the event building are stored in a buffer until HLT1 is finished on the GPU and the raw banks for selected events are then copied to a different buffer for processing with HLT2. Only the newly created raw banks (SelReports, DecReports, routing bits) are added to the original raw banks. So if the clustered UT hits were supposed to be persisted, that would have to be added in the implementation.
For now, we can probably assume that the clusters are only used for HLT1, so the over-writing should be fine.
@cmarinbe @decianm @dcampora @jonrob do you know what the plan is for the UT clusters created in HLT1?
Aside from the RETINA clusters, which are special, the plan is to redo all clustering in HLT2. Saving clusters AND raw banks limits the maximum output rate of HLT1 depending on how much it increases the HLT1 event size because of the physical limit at which the disks can be written to, so is not a favoured solution. If the proposal is to cluster in HLT1 and then throw away the raw information this can maybe work, but is not what I assumed as the baseline solution.
Thanks for the clarification @gligorov.
Indeed, @mstahl implemented the clustering such that the information overwrites the original raw banks. But these new raw banks are not persisted for now. So I suggest that we merge this MR as is (with the over-writing on the device only) and in the future we can still decide to persist these raw banks and use them as input for HLT2.
- Resolved by Marian Stahl
Would it make sense to run
TestUTHits
as part of a test?
mentioned in issue Moore#351 (closed)
added 34 commits
-
ac7ae771...946241a6 - 32 commits from branch
master
- 3c8cb252 - [UT decoding] clustering for v5 (TELL40) decoding
- 7e039774 - UT clustering with const raw banks. Broken due to race condition?
-
ac7ae771...946241a6 - 32 commits from branch
added 1 commit
- 2038462f - cast away constness of raw bank data for clustering to work after rebase on master
assigned to @lpica
added RTA label
mentioned in issue Moore#356 (closed)
unassigned @lpica
Hi, @mstahl. Could you please update the status of this MR?
mentioned in issue Moore#362 (closed)
added 65 commits
-
08ea0de1...278ea59d - 62 commits from branch
master
- 23f0e1e9 - [UT decoding] clustering for v5 (TELL40) decoding
- c4e4410d - cast away constness of raw bank data for clustering to work after rebase on master
- ff705b9e - less hardcoding; move UT decoding constants
Toggle commit list-
08ea0de1...278ea59d - 62 commits from branch
added 290 commits
-
99bae3df...0cb11f7c - 287 commits from branch
master
- 6228fd61 - [UT decoding] clustering for v5 (TELL40) decoding
- f4924ebf - cast away constness of raw bank data for clustering to work after rebase on master
- 31bfc5dd - less hardcoding; move UT decoding constants
Toggle commit list-
99bae3df...0cb11f7c - 287 commits from branch
mentioned in issue LHCb#202
mentioned in merge request LHCb!4269 (merged)
mentioned in issue #467 (closed)