RICH Sequence in DD4HEP builds is insanely slow
I ran into this whilst working on the issues discussed in !731 (merged) and the cleanups in !719 (merged)
It turns out the RICH sequence, more specifically the RichPDFullResponse is insanely slow in the DD4HEP builds (which uses a different algorithm implementation to DetDesc).
Re-enabling the timing table in the run3 test (disabled by default for some odd reason given how useful it is) I see for DD4HEP
TimingAuditor.T... INFO Digi | 13171.000 | 13242.733 | 7249.371 19092.0 3524.31 | 10 | 132.427 |
TimingAuditor.T... INFO DigiVPSeq | 178.000 | 178.434 | 100.926 273.1 49.22 | 10 | 1.784 |
TimingAuditor.T... INFO VPEnsureBanks | 1.000 | 0.019 | 0.013 0.0 0.01 | 10 | 0.000 |
TimingAuditor.T... INFO VPDepositCreator | 164.000 | 166.046 | 93.674 254.2 45.93 | 10 | 1.660 |
TimingAuditor.T... INFO VPDigitCreator | 4.000 | 3.467 | 1.896 5.7 1.06 | 10 | 0.035 |
TimingAuditor.T... INFO VPSuperPixelBankEncoder | 0.000 | 0.288 | 0.159 0.4 0.08 | 10 | 0.003 |
TimingAuditor.T... INFO LHCb::UnpackRawEvent | 0.000 | 0.005 | 0.004 0.0 0.00 | 10 | 0.000 |
TimingAuditor.T... INFO VPRetinaClusterCreator | 9.000 | 8.572 | 5.150 12.7 2.19 | 10 | 0.086 |
TimingAuditor.T... INFO RawEventSimpleCombiner | 0.000 | 0.010 | 0.008 0.0 0.00 | 10 | 0.000 |
TimingAuditor.T... INFO DigiUTSeq | 31.000 | 41.562 | 16.982 173.0 46.86 | 10 | 0.416 |
TimingAuditor.T... INFO MCUTDepositCreator | 19.000 | 31.162 | 9.907 164.6 47.24 | 10 | 0.312 |
TimingAuditor.T... INFO MCUTDigitCreator | 1.000 | 0.894 | 0.541 1.6 0.33 | 10 | 0.009 |
TimingAuditor.T... INFO UTDigitCreator | 3.000 | 5.535 | 3.983 7.9 1.49 | 10 | 0.055 |
TimingAuditor.T... INFO UTDigitsToRawBankAlg | 8.000 | 3.953 | 2.537 6.8 1.49 | 10 | 0.040 |
TimingAuditor.T... INFO DigiFTSeq | 135.000 | 156.247 | 85.803 327.7 69.13 | 10 | 1.562 |
TimingAuditor.T... INFO FTMCHitSpillMerger | 12.000 | 28.869 | 0.170 238.9 75.30 | 10 | 0.289 |
TimingAuditor.T... INFO MCFTDepositCreator | 96.000 | 97.842 | 67.527 139.0 25.19 | 10 | 0.978 |
TimingAuditor.T... INFO MCFTDigitCreator | 15.000 | 17.142 | 11.090 27.1 5.12 | 10 | 0.171 |
TimingAuditor.T... INFO FTClusterCreator | 9.000 | 8.793 | 4.412 16.5 4.18 | 10 | 0.088 |
TimingAuditor.T... INFO FTRawBankEncoder | 2.000 | 3.425 | 2.465 4.9 0.83 | 10 | 0.034 |
TimingAuditor.T... INFO UnpackRawEvent | 0.000 | 0.011 | 0.010 0.0 0.00 | 10 | 0.000 |
TimingAuditor.T... INFO FTRawBankDecoder | 1.000 | 0.141 | 0.109 0.2 0.03 | 10 | 0.001 |
TimingAuditor.T... INFO DigiRichSeq | 12819.000 | 12841.430 | 7039.758 18635.8 3478.21 | 10 | 128.414 |
TimingAuditor.T... INFO RichPDSignal | 15.000 | 21.444 | 0.573 138.9 46.34 | 10 | 0.214 |
TimingAuditor.T... INFO RichSummedDeposits | 3.000 | 1.659 | 0.880 2.8 0.54 | 10 | 0.017 |
TimingAuditor.T... INFO RichPDFullResponse | 12800.000 | 12817.916 | 7038.120 18629.5 3493.18 | 10 | 128.179 |
TimingAuditor.T... INFO RichFillRawBuffer | 1.000 | 0.322 | 0.163 1.0 0.26 | 10 | 0.003 |
TimingAuditor.T... INFO DigiCaloSeq | 7.000 | 22.646 | 2.302 172.0 53.13 | 10 | 0.226 |
TimingAuditor.T... INFO EcalSignal | 5.000 | 20.594 | 0.804 170.2 53.02 | 10 | 0.206 |
TimingAuditor.T... INFO HcalSignal | 2.000 | 0.667 | 0.154 4.5 1.35 | 10 | 0.007 |
TimingAuditor.T... INFO EcalDigit | 0.000 | 1.004 | 0.922 1.2 0.10 | 10 | 0.010 |
TimingAuditor.T... INFO HcalDigit | 0.000 | 0.246 | 0.226 0.4 0.04 | 10 | 0.002 |
TimingAuditor.T... INFO EcalFillRawBuffer | 0.000 | 0.097 | 0.086 0.1 0.01 | 10 | 0.001 |
TimingAuditor.T... INFO HcalFillRawBuffer | 0.000 | 0.018 | 0.018 0.0 0.00 | 10 | 0.000 |
TimingAuditor.T... INFO DigiMuonSeq | 1.000 | 2.401 | 1.672 4.0 0.83 | 10 | 0.024 |
TimingAuditor.T... INFO MuonDigitization | 0.000 | 2.259 | 1.574 3.7 0.79 | 10 | 0.023 |
TimingAuditor.T... INFO MuonDigitToTell40RawBuffer| 1.000 | 0.135 | 0.092 0.3 0.05 | 10 | 0.001 |
whilst the similar section for DetDesc is
TimingAuditor.T... INFO Digi | 463.000 | 496.626 | 290.811 704.0 131.41 | 10 | 4.966 |
TimingAuditor.T... INFO DigiVPSeq | 168.000 | 166.348 | 93.320 256.2 46.45 | 10 | 1.663 |
TimingAuditor.T... INFO VPEnsureBanks | 0.000 | 0.020 | 0.013 0.0 0.01 | 10 | 0.000 |
TimingAuditor.T... INFO VPDepositCreator | 156.000 | 153.956 | 86.083 238.0 43.33 | 10 | 1.540 |
TimingAuditor.T... INFO VPDigitCreator | 1.000 | 3.366 | 1.928 5.1 0.95 | 10 | 0.034 |
TimingAuditor.T... INFO VPSuperPixelBankEncoder | 0.000 | 0.279 | 0.155 0.4 0.07 | 10 | 0.003 |
TimingAuditor.T... INFO LHCb::UnpackRawEvent | 0.000 | 0.005 | 0.004 0.0 0.00 | 10 | 0.000 |
TimingAuditor.T... INFO VPRetinaClusterCreator | 11.000 | 8.685 | 5.104 12.6 2.15 | 10 | 0.087 |
TimingAuditor.T... INFO RawEventSimpleCombiner | 0.000 | 0.010 | 0.009 0.0 0.00 | 10 | 0.000 |
TimingAuditor.T... INFO DigiUTSeq | 104.000 | 116.546 | 76.242 214.6 43.06 | 10 | 1.165 |
TimingAuditor.T... INFO MCUTDepositCreator | 98.000 | 110.222 | 71.375 209.6 42.55 | 10 | 1.102 |
TimingAuditor.T... INFO MCUTDigitCreator | 1.000 | 0.895 | 0.528 1.5 0.31 | 10 | 0.009 |
TimingAuditor.T... INFO UTDigitCreator | 2.000 | 3.346 | 2.491 4.6 0.76 | 10 | 0.033 |
TimingAuditor.T... INFO UTDigitsToRawBankAlg | 3.000 | 2.065 | 1.582 3.0 0.50 | 10 | 0.021 |
TimingAuditor.T... INFO DigiFTSeq | 129.000 | 143.564 | 82.543 278.1 56.81 | 10 | 1.436 |
TimingAuditor.T... INFO FTMCHitSpillMerger | 11.000 | 24.545 | 0.190 195.7 61.97 | 10 | 0.245 |
TimingAuditor.T... INFO MCFTDepositCreator | 93.000 | 94.008 | 65.527 133.5 23.95 | 10 | 0.940 |
TimingAuditor.T... INFO MCFTDigitCreator | 13.000 | 12.296 | 8.420 19.5 3.71 | 10 | 0.123 |
TimingAuditor.T... INFO FTClusterCreator | 8.000 | 8.855 | 4.990 17.6 4.39 | 10 | 0.089 |
TimingAuditor.T... INFO FTRawBankEncoder | 4.000 | 3.666 | 2.792 5.3 0.87 | 10 | 0.037 |
TimingAuditor.T... INFO UnpackRawEvent | 0.000 | 0.011 | 0.011 0.0 0.00 | 10 | 0.000 |
TimingAuditor.T... INFO FTRawBankDecoder | 0.000 | 0.152 | 0.126 0.2 0.02 | 10 | 0.002 |
TimingAuditor.T... INFO DigiRichSeq | 52.000 | 57.042 | 34.906 162.1 43.19 | 10 | 0.570 |
TimingAuditor.T... INFO RichPDSignal | 15.000 | 20.347 | 0.501 127.9 43.33 | 10 | 0.203 |
TimingAuditor.T... INFO RichSummedDeposits | 3.000 | 1.736 | 1.052 2.9 0.54 | 10 | 0.017 |
TimingAuditor.T... INFO RichPDResponse | 34.000 | 34.672 | 32.345 38.5 1.99 | 10 | 0.347 |
TimingAuditor.T... INFO RichFillRawBuffer | 0.000 | 0.270 | 0.167 0.4 0.06 | 10 | 0.003 |
TimingAuditor.T... INFO DigiCaloSeq | 8.000 | 10.858 | 2.177 58.9 18.78 | 10 | 0.109 |
TimingAuditor.T... INFO EcalSignal | 6.000 | 8.878 | 0.776 57.1 18.30 | 10 | 0.089 |
TimingAuditor.T... INFO HcalSignal | 0.000 | 0.650 | 0.155 4.5 1.34 | 10 | 0.007 |
TimingAuditor.T... INFO EcalDigit | 1.000 | 0.983 | 0.911 1.2 0.07 | 10 | 0.010 |
TimingAuditor.T... INFO HcalDigit | 0.000 | 0.229 | 0.224 0.2 0.01 | 10 | 0.002 |
TimingAuditor.T... INFO EcalFillRawBuffer | 0.000 | 0.076 | 0.070 0.1 0.01 | 10 | 0.001 |
TimingAuditor.T... INFO HcalFillRawBuffer | 1.000 | 0.018 | 0.017 0.0 0.00 | 10 | 0.000 |
TimingAuditor.T... INFO DigiMuonSeq | 2.000 | 2.252 | 1.609 4.0 0.73 | 10 | 0.023 |
TimingAuditor.T... INFO MuonDigitization | 1.000 | 1.806 | 1.351 3.1 0.54 | 10 | 0.018 |
TimingAuditor.T... INFO MuonDigitToTell40RawBuffer| 1.000 | 0.438 | 0.250 0.9 0.19 | 10 | 0.004 |
This needs looking into as having Boole take multiple seconds per event in the RICH is just not acceptable.
Looking at the code I think its clear many really dumb things are currently done during the event processing loop that should be looked into. But first I will try and get some proper profiling results using valgrind.