Skip to content
Snippets Groups Projects

Draft: Clustering for UT TELL40 decoding

Closed Marian Stahl requested to merge mstahl_ut_cluster into master
3 unresolved threads

Towards Rec#198. Throughput test added in !561 (merged).

Related: Moore!928 (closed).

This implements UT clustering in software. It will be the standard for v5 decoding without switch to single strip hits.
To avoid copying data, the UT raw banks are overwritten in place.
The cluster residuals in

xx
look like this[1]:
x_residual_per_cluster_size.pdf

Performance Tests

Performances[1] are better across the board:
current master

UpstreamTrackChecker                   INFO Results
UpstreamTrackChecker                   INFO **** Upstream                                  33938 tracks including           1952 ghosts [ 5.75 %], Event average  5.27 % ****
UpstreamTrackChecker                   INFO   01_velo                                  :   30006 from    65861 [ 45.56 %]    474 clones [ 1.56 %], purity: 99.52 %, hitEff: 87.34 %
UpstreamTrackChecker                   INFO   02_velo+UT                               :   29945 from    57215 [ 52.34 %]    474 clones [ 1.56 %], purity: 99.54 %, hitEff: 87.33 %
UpstreamTrackChecker                   INFO   03_velo+UT_P>5GeV                        :   21376 from    29341 [ 72.85 %]    350 clones [ 1.61 %], purity: 99.63 %, hitEff: 89.06 %
UpstreamTrackChecker                   INFO   04_velo+notLong                          :    6121 from    28074 [ 21.80 %]     84 clones [ 1.35 %], purity: 99.27 %, hitEff: 85.15 %
UpstreamTrackChecker                   INFO   05_velo+UT+notLong                       :    6070 from    19887 [ 30.52 %]     84 clones [ 1.36 %], purity: 99.35 %, hitEff: 85.11 %
UpstreamTrackChecker                   INFO   06_velo+UT+notLong_P>5GeV                :    3297 from     5322 [ 61.95 %]     58 clones [ 1.73 %], purity: 99.54 %, hitEff: 89.73 %
UpstreamTrackChecker                   INFO   07_long                                  :   23885 from    37787 [ 63.21 %]    390 clones [ 1.61 %], purity: 99.59 %, hitEff: 87.89 %
UpstreamTrackChecker                   INFO   07_long_strange                          :    1292 from     2576 [ 50.16 %]     22 clones [ 1.67 %], purity: 99.26 %, hitEff: 88.01 %
UpstreamTrackChecker                   INFO   08_long_P>5GeV                           :   18089 from    24385 [ 74.18 %]    292 clones [ 1.59 %], purity: 99.64 %, hitEff: 88.95 %
UpstreamTrackChecker                   INFO   08_long_strange_P>5GeV                   :     950 from     1417 [ 67.04 %]     14 clones [ 1.45 %], purity: 99.30 %, hitEff: 89.54 %
UpstreamTrackChecker                   INFO   09_long_fromB                            :     267 from      377 [ 70.82 %]      2 clones [ 0.74 %], purity: 99.51 %, hitEff: 87.36 %
UpstreamTrackChecker                   INFO   09_long_fromD                            :    1813 from     2565 [ 70.68 %]     26 clones [ 1.41 %], purity: 99.59 %, hitEff: 88.37 %
UpstreamTrackChecker                   INFO   10_long_fromB_P>5GeV                     :     224 from      271 [ 82.66 %]      1 clones [ 0.44 %], purity: 99.58 %, hitEff: 87.71 %
UpstreamTrackChecker                   INFO   10_long_fromD_P>5GeV                     :    1559 from     1882 [ 82.84 %]     20 clones [ 1.27 %], purity: 99.66 %, hitEff: 88.98 %
UpstreamTrackChecker                   INFO   11_long_electrons                        :     547 from     2842 [ 19.25 %]     19 clones [ 3.36 %], purity: 98.20 %, hitEff: 86.24 %
UpstreamTrackChecker                   INFO   12_long_fromB_electrons                  :      13 from       39 [ 33.33 %]      1 clones [ 7.14 %], purity: 97.71 %, hitEff: 86.55 %
UpstreamTrackChecker                   INFO   13_long_fromB_electrons_P>5GeV           :      11 from       24 [ 45.83 %]      1 clones [ 8.33 %], purity: 97.33 %, hitEff: 84.31 %
UpstreamTrackChecker                   INFO   14_long_fromB_P>3GeV_Pt>0.5GeV           :     205 from      221 [ 92.76 %]      0 clones [ 0.00 %], purity: 99.57 %, hitEff: 86.64 %
UpstreamTrackChecker                   INFO   14_long_fromB_electrons_P>3GeV_Pt>0.5GeV :      10 from       15 [ 66.67 %]      1 clones [ 9.09 %], purity: 97.74 %, hitEff: 88.33 %
UpstreamTrackChecker                   INFO   14_long_fromD_P>3GeV_Pt>0.5GeV           :    1443 from     1538 [ 93.82 %]     18 clones [ 1.23 %], purity: 99.71 %, hitEff: 88.71 %
UpstreamTrackChecker                   INFO   14_long_strange_P>3GeV_Pt>0.5GeV         :     744 from      810 [ 91.85 %]     13 clones [ 1.72 %], purity: 99.45 %, hitEff: 89.05 %
UpstreamTrackChecker                   INFO   15_UT_long_fromB_P>3GeV_Pt>0.5GeV        :     205 from      221 [ 92.76 %]      0 clones [ 0.00 %], purity: 99.57 %, hitEff: 86.64 %

this MR

UpstreamTrackChecker                   INFO Results
UpstreamTrackChecker                   INFO **** Upstream                                  34229 tracks including           1892 ghosts [ 5.53 %], Event average  5.07 % ****
UpstreamTrackChecker                   INFO   01_velo                                  :   30354 from    65861 [ 46.09 %]    481 clones [ 1.56 %], purity: 99.46 %, hitEff: 87.13 %
UpstreamTrackChecker                   INFO   02_velo+UT                               :   30288 from    57215 [ 52.94 %]    481 clones [ 1.56 %], purity: 99.48 %, hitEff: 87.13 %
UpstreamTrackChecker                   INFO   03_velo+UT_P>5GeV                        :   21732 from    29341 [ 74.07 %]    356 clones [ 1.61 %], purity: 99.53 %, hitEff: 88.71 %
UpstreamTrackChecker                   INFO   04_velo+notLong                          :    6135 from    28074 [ 21.85 %]     88 clones [ 1.41 %], purity: 99.24 %, hitEff: 85.04 %
UpstreamTrackChecker                   INFO   05_velo+UT+notLong                       :    6078 from    19887 [ 30.56 %]     88 clones [ 1.43 %], purity: 99.33 %, hitEff: 85.01 %
UpstreamTrackChecker                   INFO   06_velo+UT+notLong_P>5GeV                :    3337 from     5322 [ 62.70 %]     61 clones [ 1.80 %], purity: 99.45 %, hitEff: 89.29 %
UpstreamTrackChecker                   INFO   07_long                                  :   24219 from    37787 [ 64.09 %]    393 clones [ 1.60 %], purity: 99.52 %, hitEff: 87.66 %
UpstreamTrackChecker                   INFO   07_long_strange                          :    1312 from     2576 [ 50.93 %]     21 clones [ 1.58 %], purity: 99.13 %, hitEff: 87.73 %
UpstreamTrackChecker                   INFO   08_long_P>5GeV                           :   18404 from    24385 [ 75.47 %]    295 clones [ 1.58 %], purity: 99.54 %, hitEff: 88.60 %
UpstreamTrackChecker                   INFO   08_long_strange_P>5GeV                   :     969 from     1417 [ 68.38 %]     13 clones [ 1.32 %], purity: 99.12 %, hitEff: 89.04 %
UpstreamTrackChecker                   INFO   09_long_fromB                            :     273 from      377 [ 72.41 %]      2 clones [ 0.73 %], purity: 99.42 %, hitEff: 87.15 %
UpstreamTrackChecker                   INFO   09_long_fromD                            :    1850 from     2565 [ 72.12 %]     26 clones [ 1.39 %], purity: 99.43 %, hitEff: 87.80 %
UpstreamTrackChecker                   INFO   10_long_fromB_P>5GeV                     :     229 from      271 [ 84.50 %]      1 clones [ 0.43 %], purity: 99.49 %, hitEff: 87.71 %
UpstreamTrackChecker                   INFO   10_long_fromD_P>5GeV                     :    1596 from     1882 [ 84.80 %]     20 clones [ 1.24 %], purity: 99.44 %, hitEff: 88.24 %
UpstreamTrackChecker                   INFO   11_long_electrons                        :     557 from     2842 [ 19.60 %]     19 clones [ 3.30 %], purity: 98.25 %, hitEff: 86.06 %
UpstreamTrackChecker                   INFO   12_long_fromB_electrons                  :      13 from       39 [ 33.33 %]      1 clones [ 7.14 %], purity: 97.82 %, hitEff: 90.12 %
UpstreamTrackChecker                   INFO   13_long_fromB_electrons_P>5GeV           :      11 from       24 [ 45.83 %]      1 clones [ 8.33 %], purity: 97.46 %, hitEff: 88.47 %
UpstreamTrackChecker                   INFO   14_long_fromB_P>3GeV_Pt>0.5GeV           :     211 from      221 [ 95.48 %]      0 clones [ 0.00 %], purity: 99.54 %, hitEff: 86.82 %
UpstreamTrackChecker                   INFO   14_long_fromB_electrons_P>3GeV_Pt>0.5GeV :      10 from       15 [ 66.67 %]      1 clones [ 9.09 %], purity: 97.88 %, hitEff: 92.88 %
UpstreamTrackChecker                   INFO   14_long_fromD_P>3GeV_Pt>0.5GeV           :    1474 from     1538 [ 95.84 %]     18 clones [ 1.21 %], purity: 99.52 %, hitEff: 88.06 %
UpstreamTrackChecker                   INFO   14_long_strange_P>3GeV_Pt>0.5GeV         :     755 from      810 [ 93.21 %]     13 clones [ 1.69 %], purity: 99.25 %, hitEff: 88.59 %
UpstreamTrackChecker                   INFO   15_UT_long_fromB_P>3GeV_Pt>0.5GeV        :     211 from      221 [ 95.48 %]      0 clones [ 0.00 %], purity: 99.54 %, hitEff: 86.82 %

Track resolution checks have been carried out as well. There are no significant changes:
current master

TrackResChecker                        INFO      ************************************
TrackResChecker                        INFO ALL/x pull     :  mean =  0.008 +/- 0.007, RMS = 0.981 +/- 0.007
TrackResChecker                        INFO ALL/y pull     :  mean =  0.006 +/- 0.007, RMS = 0.973 +/- 0.006
TrackResChecker                        INFO ALL/tx pull    :  mean =  -0.010 +/- 0.007, RMS = 1.023 +/- 0.007
TrackResChecker                        INFO ALL/ty pull    :  mean =  -0.006 +/- 0.007, RMS = 0.977 +/- 0.007
TrackResChecker                        INFO ALL/p pull     :  mean =  0.006 +/- 0.001, RMS = 0.147 +/- 0.007
TrackResChecker                        INFO ALL/probChi2   :  mean =  0.537 +/- 0.002, RMS = 0.320 +/- 0.001
TrackResChecker                        INFO ALL/x resolution / mm:  RMS =  43.809 +/- 0.551 micron
TrackResChecker                        INFO ALL/y resolution / mm:  RMS =  44.321 +/- 0.588 micron
TrackResChecker                        INFO ALL/dp/p:  mean =  -0.0009 +/- 0.0001, RMS =  0.0083 +/- 0.0001

this MR

TrackResChecker                        INFO      ************************************
TrackResChecker                        INFO ALL/x pull     :  mean =  0.007 +/- 0.007, RMS = 0.982 +/- 0.006
TrackResChecker                        INFO ALL/y pull     :  mean =  0.006 +/- 0.007, RMS = 0.976 +/- 0.006
TrackResChecker                        INFO ALL/tx pull    :  mean =  -0.009 +/- 0.007, RMS = 1.024 +/- 0.007
TrackResChecker                        INFO ALL/ty pull    :  mean =  -0.005 +/- 0.007, RMS = 0.980 +/- 0.007
TrackResChecker                        INFO ALL/p pull     :  mean =  0.006 +/- 0.001, RMS = 0.148 +/- 0.007
TrackResChecker                        INFO ALL/probChi2   :  mean =  0.537 +/- 0.002, RMS = 0.320 +/- 0.001
TrackResChecker                        INFO ALL/x resolution / mm:  RMS =  43.773 +/- 0.545 micron
TrackResChecker                        INFO ALL/y resolution / mm:  RMS =  44.602 +/- 0.586 micron
TrackResChecker                        INFO ALL/dp/p:  mean =  -0.0009 +/- 0.0001, RMS =  0.0083 +/- 0.0001

Throuput tests

master https://mattermost.web.cern.ch/lhcb/pl/qfcgykko47fz5fr4bt3rawrpge

Throughput of branch master (98884cfe), sequence hlt1_pp_scifi_v6 over dataset upgrade-magdown-sim10-up08-30000000-digi_01:
NVIDIA A10            │██████████████████████████████████████████████   155.09 kHz (1.00x)
GeForce RTX 3080      │███████████████████████████████████████████      145.58 kHz (1.00x)
GeForce RTX 3090      │███████████████████████████████████████████      145.52 kHz (1.00x)
A40                   │███████████████████████████████████████████      144.50 kHz (0.98x)
RTX A6000             │██████████████████████████████████████████       140.33 kHz (0.98x)
Quadro RTX 6000       │███████████████████████████████████████          132.44 kHz (1.00x)
MI100                 │███████████████████████████████████████          131.09 kHz (0.99x)
GeForce RTX 2080 Ti   │██████████████████████████████████████           128.51 kHz (0.99x)
Tesla V100-PCIE-32GB  │████████████████████████████████████             122.64 kHz (1.00x)
AMD EPYC 7502 32-Core │██████                                           20.81 kHz (0.99x)
Intel Xeon E5-2630 v4 │█                                                4.37 kHz (1.01x)
                      ┼──┴──┼──┴──┼──┴──┼──┴──┼──┴──┼──┴──┼──┴──┼──┴──┼
                      0     20    40    60    80   100   120   140   160  

this MR https://mattermost.web.cern.ch/lhcb/pl/63e1nqwd9bb99kymumyx3domjo

Throughput of branch mstahl_ut_cluster (bea5d981), sequence hlt1_pp_scifi_v6 over dataset upgrade-magdown-sim10-up08-30000000-digi_01:
NVIDIA A10            │████████████████████████████████     160.45 kHz (1.03x)
A40                   │█████████████████████████████        149.56 kHz (1.03x)
GeForce RTX 3090      │████████████████████████████         143.62 kHz (0.99x)
RTX A6000             │███████████████████████████          138.38 kHz (0.99x)
GeForce RTX 3080      │███████████████████████████          138.07 kHz (0.95x)
MI100                 │███████████████████████████          137.81 kHz (1.05x)
Quadro RTX 6000       │██████████████████████████           133.70 kHz (1.01x)
GeForce RTX 2080 Ti   │██████████████████████████           130.20 kHz (1.01x)
Tesla V100-PCIE-32GB  │████████████████████████             122.48 kHz (1.00x)
AMD EPYC 7502 32-Core │████                                 21.04 kHz (1.01x)
Intel Xeon E5-2630 v4 │▌                                    4.46 kHz (1.02x)
                      ┼─┴─┼─┴─┼─┴─┼─┴─┼─┴─┼─┴─┼─┴─┼─┴─┼─┴─┼
                      0   20  40  60  80 100 120 140 160 180 

[1] The tests have been carried out with all events (1046; only 100 for the plot) /eos/lhcb/grid/prod/lhcb/MC/Upgrade/XDIGI/00129676/0000/00129676_00000010_1.xdigi (Event type 25103102, Lc -> Lambda pi) from which 619 (66) events passed the GEC. Note that the GEC still takes single strip hits, not clusters.

Edited by Marian Stahl

Merge request reports

Approval is optional

Closed by Marian StahlMarian Stahl 1 year ago (Jan 15, 2024 9:34pm UTC)

Merge details

  • The changes were not merged into master.

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • FYI: Rebasing on master should fix the Allen CI pipeline. Now that !685 (merged) is merged, the full functionality of the CI, including the dataset required for this MR, is back in place.

  • Dorothea Vom Bruch removed review request for @dovombru

    removed review request for @dovombru

  • Marian Stahl added 147 commits

    added 147 commits

    Compare with previous version

  • Dorothea Vom Bruch resolved all threads

    resolved all threads

  • mentioned in issue Moore#347 (closed)

  • Roel Aaij
    • Looks good to me one all the builds have been fixed.

    • Author Developer

      It looks to me as if this cannot work any longer as intended.
      The clustering was relying on updating/re-writing raw banks. I'm not sure how this has worked in the first place with the raw data being const :laughing:

      I need to think a bit more about this. My first reflex is to add a new member to UTRawBank that contains the clustering information that I tried to write into the few bits reserved for the ADC value.
      Let me know if this makes sense.

    • Author Developer

      Adding a member variable sounded easier than it was. It would have resulted in a 256 times 6 matrix with the clustering information that would have had to be passed between functions, or re-generated whenever UTRaw is called (i.e. the clustering would have run at least 3 times).

      The solution now casts away constness. I'm not sure if this is the way to go in the end.

    • Hi @raaij, can you take a look at this? If it sounds good to you I can test it.

    • Do I understand correctly that the clustering will be re-run in HLT2? I.e. the raw banks produced here in HLT1 are not expected to be persisted and passed on to HLT2? In that case I think over-writing the raw banks is fine.

    • Author Developer

      I have to admit that I'm not 100% sure how this will look like during data taking. In case it matters: the banks are only overwritten in a device function.

    • In any case the way the Allen persistence is set up at the moment, the raw banks obtained from the event building are stored in a buffer until HLT1 is finished on the GPU and the raw banks for selected events are then copied to a different buffer for processing with HLT2. Only the newly created raw banks (SelReports, DecReports, routing bits) are added to the original raw banks. So if the clustered UT hits were supposed to be persisted, that would have to be added in the implementation.

      For now, we can probably assume that the clusters are only used for HLT1, so the over-writing should be fine.

      @cmarinbe @decianm @dcampora @jonrob do you know what the plan is for the UT clusters created in HLT1?

    • Aside from the RETINA clusters, which are special, the plan is to redo all clustering in HLT2. Saving clusters AND raw banks limits the maximum output rate of HLT1 depending on how much it increases the HLT1 event size because of the physical limit at which the disks can be written to, so is not a favoured solution. If the proposal is to cluster in HLT1 and then throw away the raw information this can maybe work, but is not what I assumed as the baseline solution.

    • Thanks for the clarification @gligorov.

      Indeed, @mstahl implemented the clustering such that the information overwrites the original raw banks. But these new raw banks are not persisted for now. So I suggest that we merge this MR as is (with the over-writing on the device only) and in the future we can still decide to persist these raw banks and use them as input for HLT2.

    • Please register or sign in to reply
  • assigned to @ascarabo and unassigned @jedavies

  • assigned to @mstahl and unassigned @ascarabo

  • Marian Stahl added 34 commits

    added 34 commits

    • ac7ae771...946241a6 - 32 commits from branch master
    • 3c8cb252 - [UT decoding] clustering for v5 (TELL40) decoding
    • 7e039774 - UT clustering with const raw banks. Broken due to race condition?

    Compare with previous version

  • Marian Stahl added 1 commit

    added 1 commit

    • 2038462f - cast away constness of raw bank data for clustering to work after rebase on master

    Compare with previous version

  • Marian Stahl added 1 commit

    added 1 commit

    • 08ea0de1 - less hardcoding; move UT decoding constants

    Compare with previous version

  • assigned to @lpica

  • added RTA label

  • mentioned in issue Moore#356 (closed)

  • unassigned @lpica

  • Marian Stahl marked this merge request as draft

    marked this merge request as draft

  • Marian Stahl added 65 commits

    added 65 commits

    • 08ea0de1...278ea59d - 62 commits from branch master
    • 23f0e1e9 - [UT decoding] clustering for v5 (TELL40) decoding
    • c4e4410d - cast away constness of raw bank data for clustering to work after rebase on master
    • ff705b9e - less hardcoding; move UT decoding constants

    Compare with previous version

  • Marian Stahl added 1 commit

    added 1 commit

    • 99bae3df - less hardcoding; move UT decoding constants

    Compare with previous version

  • Marian Stahl added 290 commits

    added 290 commits

    • 99bae3df...0cb11f7c - 287 commits from branch master
    • 6228fd61 - [UT decoding] clustering for v5 (TELL40) decoding
    • f4924ebf - cast away constness of raw bank data for clustering to work after rebase on master
    • 31bfc5dd - less hardcoding; move UT decoding constants

    Compare with previous version

  • Rosen Matev mentioned in issue LHCb#202

    mentioned in issue LHCb#202

  • Vava Gligorov assigned to @cagapopo and unassigned @mstahl

    assigned to @cagapopo and unassigned @mstahl

  • closed

  • mentioned in merge request LHCb!4269 (merged)

  • mentioned in issue #467 (closed)

  • Please register or sign in to reply
    Loading