Nondeterminism in Moore (found in BW Tests)
As noticed first in !3795 (comment 8577217)
There seems to be ~+/- 120 events passing HLT2 somewhat randomly occurring in the [2024-patches] nightly build jobs:
The results of the nondeterminism can be seen even in very simple algorithms.
The cause remains undiagnosed after some investigation.
Perhaps related to non-determinism in Throughput Tests : MooreOnline#89.
Perhaps related to non-determinism reported before for Moore but seems somewhat unlikely from the discussion there : #750
Difficult to test due to only happening on a scale of 1:1000 events, and not consistently.
This is important as it makes rate (and thus BW) evaluations nondeterministic and unclear.
cc @msaur @rjhunter
Feel free to add clarifications/ more information. I've attempted to summarise the main information and concerns I have.
Example of a simple algorithm differing
Consider muon_filter_for_Z, (ParticleRangeFilter/ParticleRangeFilter_9b23c503). This is a long_muon_with_ismuon + a min_PT cut of 3. GeV.
2024-patches.161_Moore_hlt2_and_spruce_bandwidth, i.e. 'high fluctuation'
LAZY_AND: Hlt2QEE_ZToMuMuFullDecisionWithOutput #=100000 Sum=0 Eff=|( 0.000000 +- 0.00000 )%|
LAZY_AND: Hlt2QEE_ZToMuMuFull #=100000 Sum=0 Eff=|( 0.000000 +- 0.00000 )%|
DeterministicPrescaler/Hlt2QEE_ZToMuMuFull_Prescaler #=100000 Sum=100000 Eff=|( 100.0000 +- 0.00000 )%|
VoidFilter/Default_Hlt1Filter #=100000 Sum=95878 Eff=|( 95.87800 +- 0.0628657)%|
ParticleRangeFilter/ParticleRangeFilter_9b23c503 #=95878 Sum=2180 Eff=|( 2.273723 +- 0.0481410)%|
TwoBodyCombiner/TwoBodyCombiner_17deea39 #=2180 Sum=0 Eff=|( 0.000000 +- 0.00000 )%|
.162 i.e. 'low fluctuation'
LAZY_AND: Hlt2QEE_ZToMuMuFullDecisionWithOutput #=100000 Sum=0 Eff=|( 0.000000 +- 0.00000 )%|
LAZY_AND: Hlt2QEE_ZToMuMuFull #=100000 Sum=0 Eff=|( 0.000000 +- 0.00000 )%|
DeterministicPrescaler/Hlt2QEE_ZToMuMuFull_Prescaler #=100000 Sum=100000 Eff=|( 100.0000 +- 0.00000 )%|
VoidFilter/Default_Hlt1Filter #=100000 Sum=95878 Eff=|( 95.87800 +- 0.0628657)%|
ParticleRangeFilter/ParticleRangeFilter_9b23c503 #=95878 Sum=2171 Eff=|( 2.264336 +- 0.0480438)%|
TwoBodyCombiner/TwoBodyCombiner_17deea39 #=2171 Sum=0 Eff=|( 0.000000 +- 0.00000 )%|
- Is the configuration the same? The BW Tests aren't changing in their configuration between nightlies.
- Is the input the same? It seems so due to the PV and Hlt1 filter success is unchanging (statistically unlikely if the input is differing by ~500 events)
- Is the long_muon_with_ismuon container the same? Unclear
- Is the pt distribution of the container the same? Unclear
Clearly neither of these last two points should be non-deterministic...
We do also see this affecting non-muon containers too I just chose to highlight this line out of familiarity with the line builder.
More information about the issue
Expand
As noticed first in !3795 (comment 8577217)
There seems to be ~+/- 120 events passing HLT2 somewhat randomly occurring in the [2024-patches] nightly build jobs:
- build.161 : 17154
- build.160, .159 : 17036
- build.158 : 17153
- build.157, .156 : 17035
- build.155, .154 : 17032
- build.153 : 17150
Some points investigated in !3795 (comment 8592484):
This is repetitive, showing a 'binary' nature, i.e. it always seems to vary by the same amount up or down when examining the nightlies. -> To me this points to it not being a multithreading issue/ race condition, where i'd instead expect a distributions of variations. It affects lines that have no prescale. The require_pvs and require_hlt1_filter_code checks don't change -> Seems unlikely that there's a difference of ~500 events being processed while maintaining the same percent of events with pvs that pass hlt1 identically. -> Implies unrelated to input files The selected event numbers are different sets. (Both
lhcb-2024-patches.161andlhcb-2024-patches.165are not subsets of the other, i.e. there's events in both that weren't selected in the other.) The results of the nondeterminism can be seen even in very simple algorithms.