Reduce differences in selection rates between CPU and GPU compilations
!414 (merged) has caused percent level difference in the selection rates between the CPU and GPU compilation. This needs to be understood and reduced
Designs
- Show closed items
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Flo Reiss mentioned in merge request !474 (merged)
mentioned in merge request !474 (merged)
- Flo Reiss added selections label
added selections label
@dcampora are there any obvious changes which could explain why the selection rates diverged in !414 (merged)?
Collapse replies - Maintainer
I read the description again, the only thing that comes to mind is:
- CPU uses
float
backend forhalf_t
with no transformation, which introduces a slight divergence wrt GPU results.
Note that this is still configurable at compile time, you could test compiling with
ALWAYS_DISPATCH_TO_DEFAULT=ON
andCPU_USE_REAL_HALF=ON
, or just withALWAYS_DISPATCH_TO_DEFAULT=ON
.Edited by Daniel Hugo Campora Perez - CPU uses
- Developer
I tried out this suggestion on 5k Bs2PhiPhi events. It looks like these options result in percent level differences.
====================================================================== Default ====================================================================== HLT1 rates: Hlt1TrackMVA: 1011/ 5000, ( 6066.00 +/- 170.40) kHz Hlt1TwoTrackMVA: 1802/ 5000, (10812.00 +/- 203.70) kHz Hlt1NoBeam: 3/ 5000, ( 18.00 +/- 10.39) kHz Hlt1BeamOne: 0/ 5000, ( 0.00 +/- 0.00) kHz Hlt1BeamTwo: 0/ 5000, ( 0.00 +/- 0.00) kHz Hlt1BothBeams: 0/ 5000, ( 0.00 +/- 0.00) kHz Hlt1VeloMicroBias: 6/ 5000, ( 36.00 +/- 14.69) kHz Hlt1ODINLumi: 0/ 5000, ( 0.00 +/- 0.00) kHz Hlt1ODINNoBias: 0/ 5000, ( 0.00 +/- 0.00) kHz Hlt1SingleHighPtMuon: 11/ 5000, ( 66.00 +/- 19.88) kHz Hlt1LowPtMuon: 564/ 5000, ( 3384.00 +/- 134.22) kHz Hlt1D2KK: 118/ 5000, ( 708.00 +/- 64.40) kHz Hlt1D2KPi: 176/ 5000, ( 1056.00 +/- 78.19) kHz Hlt1D2PiPi: 156/ 5000, ( 936.00 +/- 73.76) kHz Hlt1DiMuonHighMass: 62/ 5000, ( 372.00 +/- 46.95) kHz Hlt1DiMuonLowMass: 111/ 5000, ( 666.00 +/- 62.51) kHz Hlt1DiMuonSoft: 6/ 5000, ( 36.00 +/- 14.69) kHz Hlt1LowPtDiMuon: 210/ 5000, ( 1260.00 +/- 85.10) kHz Hlt1TrackMuonMVA: 55/ 5000, ( 330.00 +/- 44.25) kHz Hlt1GECPassthrough: 4170/ 5000, (25020.00 +/- 157.86) kHz Hlt1Passthrough: 5000/ 5000, (30000.00 +/- 0.00) kHz Inclusive: 5000/ 5000, (30000.00 +/- 0.00) kHz ====================================================================== ALWAYS_DISPATCH_TO_DEFAULT=ON ====================================================================== HLT1 rates: Hlt1TrackMVA: 1011/ 5000, ( 6066.00 +/- 170.40) kHz Hlt1TwoTrackMVA: 1802/ 5000, (10812.00 +/- 203.70) kHz Hlt1NoBeam: 3/ 5000, ( 18.00 +/- 10.39) kHz Hlt1BeamOne: 0/ 5000, ( 0.00 +/- 0.00) kHz Hlt1BeamTwo: 0/ 5000, ( 0.00 +/- 0.00) kHz Hlt1BothBeams: 0/ 5000, ( 0.00 +/- 0.00) kHz Hlt1VeloMicroBias: 6/ 5000, ( 36.00 +/- 14.69) kHz Hlt1ODINLumi: 0/ 5000, ( 0.00 +/- 0.00) kHz Hlt1ODINNoBias: 0/ 5000, ( 0.00 +/- 0.00) kHz Hlt1SingleHighPtMuon: 11/ 5000, ( 66.00 +/- 19.88) kHz Hlt1LowPtMuon: 564/ 5000, ( 3384.00 +/- 134.22) kHz Hlt1D2KK: 118/ 5000, ( 708.00 +/- 64.40) kHz Hlt1D2KPi: 176/ 5000, ( 1056.00 +/- 78.19) kHz Hlt1D2PiPi: 157/ 5000, ( 942.00 +/- 73.99) kHz Hlt1DiMuonHighMass: 62/ 5000, ( 372.00 +/- 46.95) kHz Hlt1DiMuonLowMass: 111/ 5000, ( 666.00 +/- 62.51) kHz Hlt1DiMuonSoft: 6/ 5000, ( 36.00 +/- 14.69) kHz Hlt1LowPtDiMuon: 210/ 5000, ( 1260.00 +/- 85.10) kHz Hlt1TrackMuonMVA: 55/ 5000, ( 330.00 +/- 44.25) kHz Hlt1GECPassthrough: 4170/ 5000, (25020.00 +/- 157.86) kHz Hlt1Passthrough: 5000/ 5000, (30000.00 +/- 0.00) kHz Inclusive: 5000/ 5000, (30000.00 +/- 0.00) kHz ====================================================================== ALWAYS_DISPATCH_TO_DEFAULT=ON CPU_USE_REAL_HALF=ON ====================================================================== HLT1 rates: Hlt1TrackMVA: 1003/ 5000, ( 6018.00 +/- 169.90) kHz Hlt1TwoTrackMVA: 1803/ 5000, (10818.00 +/- 203.72) kHz Hlt1NoBeam: 3/ 5000, ( 18.00 +/- 10.39) kHz Hlt1BeamOne: 0/ 5000, ( 0.00 +/- 0.00) kHz Hlt1BeamTwo: 0/ 5000, ( 0.00 +/- 0.00) kHz Hlt1BothBeams: 0/ 5000, ( 0.00 +/- 0.00) kHz Hlt1VeloMicroBias: 6/ 5000, ( 36.00 +/- 14.69) kHz Hlt1ODINLumi: 0/ 5000, ( 0.00 +/- 0.00) kHz Hlt1ODINNoBias: 0/ 5000, ( 0.00 +/- 0.00) kHz Hlt1SingleHighPtMuon: 11/ 5000, ( 66.00 +/- 19.88) kHz Hlt1LowPtMuon: 566/ 5000, ( 3396.00 +/- 134.42) kHz Hlt1D2KK: 114/ 5000, ( 684.00 +/- 63.33) kHz Hlt1D2KPi: 169/ 5000, ( 1014.00 +/- 76.67) kHz Hlt1D2PiPi: 152/ 5000, ( 912.00 +/- 72.84) kHz Hlt1DiMuonHighMass: 61/ 5000, ( 366.00 +/- 46.57) kHz Hlt1DiMuonLowMass: 117/ 5000, ( 702.00 +/- 64.14) kHz Hlt1DiMuonSoft: 6/ 5000, ( 36.00 +/- 14.69) kHz Hlt1LowPtDiMuon: 207/ 5000, ( 1242.00 +/- 84.52) kHz Hlt1TrackMuonMVA: 56/ 5000, ( 336.00 +/- 44.65) kHz Hlt1GECPassthrough: 4170/ 5000, (25020.00 +/- 157.86) kHz Hlt1Passthrough: 5000/ 5000, (30000.00 +/- 0.00) kHz Inclusive: 5000/ 5000, (30000.00 +/- 0.00) kHz ====================================================================== GPU ====================================================================== HLT1 rates: Hlt1TrackMVA: 1003/ 5000, ( 6018.00 +/- 169.90) kHz Hlt1TwoTrackMVA: 1805/ 5000, (10830.00 +/- 203.77) kHz Hlt1NoBeam: 3/ 5000, ( 18.00 +/- 10.39) kHz Hlt1BeamOne: 0/ 5000, ( 0.00 +/- 0.00) kHz Hlt1BeamTwo: 0/ 5000, ( 0.00 +/- 0.00) kHz Hlt1BothBeams: 0/ 5000, ( 0.00 +/- 0.00) kHz Hlt1VeloMicroBias: 6/ 5000, ( 36.00 +/- 14.69) kHz Hlt1ODINLumi: 0/ 5000, ( 0.00 +/- 0.00) kHz Hlt1ODINNoBias: 0/ 5000, ( 0.00 +/- 0.00) kHz Hlt1SingleHighPtMuon: 11/ 5000, ( 66.00 +/- 19.88) kHz Hlt1LowPtMuon: 569/ 5000, ( 3414.00 +/- 134.73) kHz Hlt1D2KK: 113/ 5000, ( 678.00 +/- 63.06) kHz Hlt1D2KPi: 168/ 5000, ( 1008.00 +/- 76.45) kHz Hlt1D2PiPi: 153/ 5000, ( 918.00 +/- 73.07) kHz Hlt1DiMuonHighMass: 61/ 5000, ( 366.00 +/- 46.57) kHz Hlt1DiMuonLowMass: 121/ 5000, ( 726.00 +/- 65.20) kHz Hlt1DiMuonSoft: 6/ 5000, ( 36.00 +/- 14.69) kHz Hlt1LowPtDiMuon: 214/ 5000, ( 1284.00 +/- 85.87) kHz Hlt1TrackMuonMVA: 56/ 5000, ( 336.00 +/- 44.65) kHz Hlt1GECPassthrough: 4170/ 5000, (25020.00 +/- 157.86) kHz Hlt1Passthrough: 5000/ 5000, (30000.00 +/- 0.00) kHz Inclusive: 5000/ 5000, (30000.00 +/- 0.00) kHz
Edited by Thomas Boettcher - Maintainer
Thanks. How does this compare to the rates observed on GPU?
There are intermediate solutions to
CPU_USE_REAL_HALF=ON
, such as usingfloat
but upon load zeroing some of the bits in the mantissa. - Maintainer
Nice work! D2KPi is still a bit bigger than what I'd like... can you test with a range of signal samples? We really need a way to make these tests more sensitive.
- Developer
I'm working on a new machine and having trouble with the GPU build, but from the reference file, on the GPU we have:
HLT1 rates: Hlt1TrackMVA: 211/ 1000, ( 6330.00 +/- 387.08) kHz Hlt1TwoTrackMVA: 384/ 1000, (11520.00 +/- 461.40) kHz Hlt1NoBeam: 0/ 1000, ( 0.00 +/- 0.00) kHz Hlt1BeamOne: 0/ 1000, ( 0.00 +/- 0.00) kHz Hlt1BeamTwo: 0/ 1000, ( 0.00 +/- 0.00) kHz Hlt1BothBeams: 2/ 1000, ( 60.00 +/- 42.38) kHz Hlt1VeloMicroBias: 0/ 1000, ( 0.00 +/- 0.00) kHz Hlt1ODINLumi: 0/ 1000, ( 0.00 +/- 0.00) kHz Hlt1ODINNoBias: 0/ 1000, ( 0.00 +/- 0.00) kHz Hlt1SingleHighPtMuon: 4/ 1000, ( 120.00 +/- 59.88) kHz Hlt1LowPtMuon: 121/ 1000, ( 3630.00 +/- 309.39) kHz Hlt1D2KK: 15/ 1000, ( 450.00 +/- 115.31) kHz Hlt1D2KPi: 29/ 1000, ( 870.00 +/- 159.20) kHz Hlt1D2PiPi: 27/ 1000, ( 810.00 +/- 153.77) kHz Hlt1DiMuonHighMass: 9/ 1000, ( 270.00 +/- 89.59) kHz Hlt1DiMuonLowMass: 23/ 1000, ( 690.00 +/- 142.21) kHz Hlt1DiMuonSoft: 1/ 1000, ( 30.00 +/- 29.98) kHz Hlt1LowPtDiMuon: 37/ 1000, ( 1110.00 +/- 179.08) kHz Hlt1TrackMuonMVA: 5/ 1000, ( 150.00 +/- 66.91) kHz Hlt1GECPassthrough: 856/ 1000, (25680.00 +/- 333.07) kHz Hlt1Passthrough: 1000/ 1000, (30000.00 +/- 0.00) kHz Inclusive: 1000/ 1000, (30000.00 +/- 0.00) kHz
For the CPU build with CPU_USE_REAL_HALF=ON over the same events:
HLT1 rates: Hlt1TrackMVA: 210/ 1000, ( 6300.00 +/- 386.41) kHz Hlt1TwoTrackMVA: 379/ 1000, (11370.00 +/- 460.24) kHz Hlt1NoBeam: 0/ 1000, ( 0.00 +/- 0.00) kHz Hlt1BeamOne: 0/ 1000, ( 0.00 +/- 0.00) kHz Hlt1BeamTwo: 0/ 1000, ( 0.00 +/- 0.00) kHz Hlt1BothBeams: 0/ 1000, ( 0.00 +/- 0.00) kHz Hlt1VeloMicroBias: 1/ 1000, ( 30.00 +/- 29.98) kHz Hlt1ODINLumi: 0/ 1000, ( 0.00 +/- 0.00) kHz Hlt1ODINNoBias: 0/ 1000, ( 0.00 +/- 0.00) kHz Hlt1SingleHighPtMuon: 4/ 1000, ( 120.00 +/- 59.88) kHz Hlt1LowPtMuon: 121/ 1000, ( 3630.00 +/- 309.39) kHz Hlt1D2KK: 16/ 1000, ( 480.00 +/- 119.04) kHz Hlt1D2KPi: 29/ 1000, ( 870.00 +/- 159.20) kHz Hlt1D2PiPi: 26/ 1000, ( 780.00 +/- 150.97) kHz Hlt1DiMuonHighMass: 9/ 1000, ( 270.00 +/- 89.59) kHz Hlt1DiMuonLowMass: 22/ 1000, ( 660.00 +/- 139.16) kHz Hlt1DiMuonSoft: 1/ 1000, ( 30.00 +/- 29.98) kHz Hlt1LowPtDiMuon: 37/ 1000, ( 1110.00 +/- 179.08) kHz Hlt1TrackMuonMVA: 5/ 1000, ( 150.00 +/- 66.91) kHz Hlt1GECPassthrough: 856/ 1000, (25680.00 +/- 333.07) kHz Hlt1Passthrough: 1000/ 1000, (30000.00 +/- 0.00) kHz Inclusive: 1000/ 1000, (30000.00 +/- 0.00) kHz
- Maintainer
Ah I see, I get it now. OK I wait for the bigger GPU vs. HALF=ON CPU test.
- Developer
I added GPU results above. It looks like there are still major differences in
DiMuonLowMass
andLowPtDiMuon
. I'll take a look at some other modes. - Maintainer
Very good, and good to see the other differences are smaller. Interesting that it's the muons again... cc @mfontana since she's looking at the muon parts anyway for other things.
- Maintainer
Hello folks, just bumping the priority on this a bit. I also cc @ascarabo given the VeloUT MR which will lead to a significant update of the reference files independently of anything else.
I will try to find some time at the end of the week to also look into this. By the way, is it possible to get a list of event numbers passing a certain line? It would make debugging easier and also help to understand the actual overlap of selected events between GPU and CPU compilations
- Maintainer
I think @dcampora will be able to describe how to best get this information. It is certainly available, since this is also needed for the DecReports.
- Maintainer
@freiss please look into https://gitlab.cern.ch/lhcb/Allen/-/blob/master/device/selections/Hlt1/src/DecReporter.cu#L51 for the specifics, I think the code is self-explanatory but feel free to ask me any questions.
What is the issue, though? The difference I observe from the above results is inexistent for
LowPtDiMuon
, and there is a one-event difference forDiMuonLowMass
(22 vs 23 events). I would suggest to run with many more statistics before looking at what the source of the difference is.Edited by Daniel Hugo Campora Perez - Maintainer
I see, good luck then :)
- Maintainer
- Vava Gligorov mentioned in merge request !410 (merged)
mentioned in merge request !410 (merged)
- Developer
Hi. This seems to be particularly bad for
Hlt1DisplacedDielectron
. I set up with the following:git clone ssh://git@gitlab.cern.ch:7999/lhcb/Allen.git cd Allen; git submodule update --init --recursive git rebase origin/thboettc_no_ip_dielectrons sed -i 's/m_MinIPChi2 {this, 7.4f};/m_MinIPChi2 {this, 0.0f};/' device/selections/lines/electron/include/DisplacedDielectronLine.cuh # change IPChi2 cut to 0 for an existing dielectron line mkdir build; cd build; source /cvmfs/sft.cern.ch/lcg/views/setupViews.sh LCG_101 x86_64-centos7-clang12-opt cmake -DSTANDALONE=ON -DTARGET_DEVICE=CUDA -DCUDA_ARCH=80 -DSEQUENCES=hlt1_pp_ecal -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang-12 .. make -j128 ./Allen --mdf /data/djohnson/HLT1_2021/upgrade_mc_minbias_scifi_v5_000.mdf --sequence=hlt1_pp_ecal
then compile for CPU:
cmake -DSTANDALONE=ON -DSEQUENCES=hlt1_pp_ecal -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang-12 ..
In the first case the selection yield is 2555 and in the second it's 2516, a 1.5% effect. On the GPU the number selected varies from 2555 to 2557, simply between successive executions.
The effect for the electron line seems a bit bigger than the GPU-CPU discrepancies that I also see in
Hlt1LowPtMuon
(0.7%), so I'm tagging @maxime in case there's something extra coming in through the electron ID here. @dovombru and @dcampora suggest this may be connected to #283 (comment 5002560)Edited by Daniel Johnson Collapse replies - Developer
Yes, I guess it's the same problem, I'm working on it.
- Developer
That's great! Is there a gitlab issue containing the discussion, @maxime ?
- Contributor
No it was by email. I'll forward it to you.
- Developer
- Daniel Johnson mentioned in merge request !774 (merged)
mentioned in merge request !774 (merged)
- Vava Gligorov closed
closed