There appears to be a new instability in some of the Gaudi Allen tests in Moore, where some counters in the x86_64_v3 slots are fluctuating build to build.
I had briefly taken a look at this a month ago and I couldn't reproduce the fluctuations locally with any of the affected build tags when running on the same machine. Unfortunately I didn't have the time to dig deeper, but it may be interesting to check whether these fluctuations are machine dependent (if I'm not mistaken there's different types of machines that are used for the tests and the job scheduling is random right?)
The logs should tell you which machines the tests run on, I guess. You could indeed see if the fluctuations correlate to specific nodes. If thats the case then we will need to investigate what specifically is different between the nodes in question. can you look into this, or if not find someone else from Allen to do this ?
b.t.w. even if it is just small differences due to different hardware the tests run on, some strategy to migrate it w.r.t. the test results needs to be found, as having the results fluctuate as they currently do is not acceptable from the RTA maintainer perspective.
Adding the WP2 coordinators (@mveghel@dovombru ) as I suspect this is an HLT1 reco issue so falls in their remit.
Can you please apply pressure here for someone to take a look ? Having these tests fluctuation is a real hinderance for MR merging, not to mention the potential impact on physics for whatever the underlying reason for the problem is.
if no one has time to look at it and it is not most critical right now for data taking, we can add some tolerance in the testing here, no? Of course shoving it under the carpet is not great, but would be an intermediate solution for this hassle for maintainers?
I don't know if just a single Gaudi counter can have its tolerance changed, without affecting others. The author of the algorithm with the effected counter would have to follow this up. its not just counters, there is printout from a monitor that would all need to have their tolerance changing
-PrimaryVertexChecker_a611350f INFO 00 all : 283 from 528 ( 772-244 ) [ 53.60 %], false 96 from reco. 379 ( 283+96 ) [ 25.33 %]+PrimaryVertexChecker_a611350f INFO 00 all : 283 from 528 ( 772-244 ) [ 53.60 %], false 95 from reco. 378 ( 283+95 ) [ 25.13 %] PrimaryVertexChecker_a611350f INFO 01 isolated : 172 from 300 ( 436-136 ) [ 57.33 %], false 0 from reco. 172 ( 172+0 ) [ 0.00 %]-PrimaryVertexChecker_a611350f INFO 02 close : 111 from 228 ( 336-108 ) [ 48.68 %], false 96 from reco. 207 ( 111+96 ) [ 46.38 %]-PrimaryVertexChecker_a611350f INFO 03 ntracks<10 : 13 from 55 ( 55-0 ) [ 23.64 %], false 96 from reco. 109 ( 13+96 ) [ 88.07 %]+PrimaryVertexChecker_a611350f INFO 02 close : 111 from 228 ( 336-108 ) [ 48.68 %], false 95 from reco. 206 ( 111+95 ) [ 46.12 %]+PrimaryVertexChecker_a611350f INFO 03 ntracks<10 : 13 from 55 ( 55-0 ) [ 23.64 %], false 95 from reco. 108 ( 13+95 ) [ 87.96 %] PrimaryVertexChecker_a611350f INFO 04 ntracks>=10 : 270 from 473 ( 473-0 ) [ 57.08 %], false 0 from reco. 270 ( 270+0 ) [ 0.00 %] PrimaryVertexChecker_a611350f INFO 05 z<-50.0 : 71 from 110 ( 171-61 ) [ 64.55 %], false 18 from reco. 89 ( 71+18 ) [ 20.22 %]-PrimaryVertexChecker_a611350f INFO 06 z in (-50.0, 50.0) : 151 from 305 ( 432-127 ) [ 49.51 %], false 59 from reco. 210 ( 151+59 ) [ 28.10 %]+PrimaryVertexChecker_a611350f INFO 06 z in (-50.0, 50.0) : 151 from 305 ( 432-127 ) [ 49.51 %], false 58 from reco. 209 ( 151+58 ) [ 27.75 %] PrimaryVertexChecker_a611350f INFO 07 z >=50.0 : 61 from 113 ( 169-56 ) [ 53.98 %], false 19 from reco. 80 ( 61+19 ) [ 23.75 %]-PrimaryVertexChecker_a611350f INFO 08 decayBeauty : 2 from 4 ( 4-0 ) [ 50.00 %], false 1 from reco. 98 ( 97+1 ) [ 1.02 %]-PrimaryVertexChecker_a611350f INFO 09 decayCharm : 50 from 79 ( 79-0 ) [ 63.29 %], false 22 from reco. 146 ( 124+22 ) [ 15.07 %]-PrimaryVertexChecker_a611350f INFO 10 decayStrange : 283 from 525 ( 567-42 ) [ 53.90 %], false 95 from reco. 379 ( 284+95 ) [ 25.07 %]-PrimaryVertexChecker_a611350f INFO 11 other : 0 from 3 ( 205-202 ) [ 0.00 %], false 1 from reco. 96 ( 95+1 ) [ 1.04 %]+PrimaryVertexChecker_a611350f INFO 08 decayBeauty : 2 from 4 ( 4-0 ) [ 50.00 %], false 1 from reco. 97 ( 96+1 ) [ 1.03 %]+PrimaryVertexChecker_a611350f INFO 09 decayCharm : 50 from 79 ( 79-0 ) [ 63.29 %], false 22 from reco. 145 ( 123+22 ) [ 15.17 %]+PrimaryVertexChecker_a611350f INFO 10 decayStrange : 283 from 525 ( 567-42 ) [ 53.90 %], false 94 from reco. 378 ( 284+94 ) [ 24.87 %]+PrimaryVertexChecker_a611350f INFO 11 other : 0 from 3 ( 205-202 ) [ 0.00 %], false 1 from reco. 95 ( 94+1 ) [ 1.05 %] PrimaryVertexChecker_a611350f INFO 12 1MCPV : 66 from 100 ( 100-0 ) [ 66.00 %], false 35 from reco. 101 ( 66+35 ) [ 34.65 %]@@ -22,3 +22,3 @@ PrimaryVertexChecker_a611350f INFO 14 3MCPV : 54 from 94 ( 100-6 ) [ 57.45 %], false 15 from reco. 69 ( 54+15 ) [ 21.74 %]-PrimaryVertexChecker_a611350f INFO 15 4MCPV : 35 from 76 ( 99-23 ) [ 46.05 %], false 15 from reco. 50 ( 35+15 ) [ 30.00 %]+PrimaryVertexChecker_a611350f INFO 15 4MCPV : 35 from 76 ( 99-23 ) [ 46.05 %], false 14 from reco. 49 ( 35+14 ) [ 28.57 %] PrimaryVertexChecker_a611350f INFO 16 5MCPV : 30 from 61 ( 90-29 ) [ 49.18 %], false 3 from reco. 33 ( 30+3 ) [ 9.09 %]@@ -46,9 +46,9 @@ PrimaryVertexChecker_a611350f INFO 3_res_ntracks(10,30) : x: +0.092, y: +0.087, z: +0.251-PrimaryVertexChecker_a611350f INFO 4_res_ntracks>30 : x: +0.089, y: +0.090, z: +0.232+PrimaryVertexChecker_a611350f INFO 4_res_ntracks>30 : x: +0.088, y: +0.090, z: +0.233 PrimaryVertexChecker_a611350f INFO 5_res_z<-50 : x: +0.092, y: +0.088, z: +0.221-PrimaryVertexChecker_a611350f INFO 6_res_z(-50,50) : x: +0.093, y: +0.090, z: +0.252-PrimaryVertexChecker_a611350f INFO 7_res_z>50 : x: +0.084, y: +0.088, z: +0.203+PrimaryVertexChecker_a611350f INFO 6_res_z(-50,50) : x: +0.092, y: +0.090, z: +0.253+PrimaryVertexChecker_a611350f INFO 7_res_z>50 : x: +0.085, y: +0.089, z: +0.203 PrimaryVertexChecker_a611350f INFO-PrimaryVertexChecker_a611350f INFO 1_pull_width_all : x: +2.639, y: +2.786, z: +2.472-PrimaryVertexChecker_a611350f INFO 1_pull_mean_all : x: +0.107, y: -0.002, z: +0.975+PrimaryVertexChecker_a611350f INFO 1_pull_width_all : x: +2.631, y: +2.794, z: +2.473+PrimaryVertexChecker_a611350f INFO 1_pull_mean_all : x: +0.096, y: -0.031, z: +0.975
Again, whomever is responsible for PrimaryVertexChecker would have to look into what is feasible here.
I know it might seem petty to complain so much for a small diff like this, but until someone does the RTA maintainer shift its difficult to appreciate how much randomly changing tests like this can interfere with the process. Firstly, every time it happens in a ci test you have to go in and double check it is indeed this issue and not something new. Then, when making the ref updates these diffs can cause some havoc there as well.
There is currently an awful lot of MRs scheduled now in the milestone for the June TS release, so @msaur, and the maintainers after him, are going to have their work cut out to get through it all, so anything that smooths the process is a real benefit.
I've traced back the nightlies of the last two weeks for x86_64_v3-el9-gcc13+detdesc-opt+g. In 15 nightly builds, 8 had passing tests, 5 had the failures described above and 2 did not run for 2024-patches (which is the branch I checked).
In the 5 failures, we always have allen_gaudi_velo_with_mcchecking fail in addition to to allen_gaudi_pv_with_mcchecking. The velo tracking always has a different hit efficiency as cause, and most of the time also finds a different number of tracks, which are ghosts. In the PV finding, the mean and width of the pull distribution are different. We first have to understand the issue in the velo tracking, which is most likely causing the differences in the PV finding. The forward and seed_and_match also fail, because of PV counters changing.
This is probably linked to #607 (closed), so I will follow up there.
Allen!1483 (merged) changed the way online counters are registered in Allen, making them available also in Allen-via-Moore (and all corresponding tests). That means that before that MR, counters may have been already fluctuating, but they were just not being printed by the test, so the fluctuations may have been hidden from before.
@msaur if the fluctuations are making maintenance too difficult (especially in this heavy period), perhaps we could consider skipping comparisons for the affected counters (and efficiency results in some cases) for now? @dovombru wdyt?
TBH ignoring the counters again just sounds to me like a recipe for sweeping the issue under the carpet and forgetting about it. @msaur should indeed comment, but personally I would prefer to see some sort of a more detailed investigation before resorting to that.
I would agree with @jonrob that ignoring this problem (exclusions ) is surely not a way to go. Exclusion list is already rather long and too often no one is following various issues in the moment any warning/error is in the exclusion list.
Some solution is needed and given available resource it will hardly be anytime soon. But, the pattern in which this is happening is quite a well described, so I think it could be somehow bearable for now as majority of the MRs aiming for June TS should be a selection-related MRs and those in general should not requite any reference update (maybe I am too naive at this point).
If that would not work, then I would call for some solution.
From the technical point of view, my understanding is that if a reference update will be needed and these failed tests would be included, then relevant reference files should be removed, i.e. references should not be changed.
RecoConf_allen_gaudi_seed_and_match_with_ut_with_mccheckingis now fluctuating as well, which is probably expected as it is very similar to other fluctuating tests, for example lhcb-2024-patches-mr. To be tested together with Allen!1678 (merged).