Efficiency tests in master failing

mentioned in merge request !696 (merged)

Just for info, I see the same problem in the pipeline here: !551 (merged) (https://gitlab.cern.ch/lhcb/Allen/-/jobs/17446629)

This was there before !685 (merged). I believe the error was introduced with !561 (merged), where other data was introduced as part of the test (in that MR the pipeline tab, the penultimate test showed it for instance: https://gitlab.cern.ch/lhcb/Allen/-/jobs/15914972). I would suggest to update the description of the issue.

The status since has been intermittently failing physics efficienty tests, both for the basic pipeline and the full pipeline. I thought an issue already existed but to my surprise that is not the case.

(CC @dovombru @raaij).

One thing that I noticed looking at different pipelines is that we have 2 effects. As you say, we have sometimes that 0.01% difference in the hit efficiency either in the basic or full pipeline. But now I also see this difference in the Hlt1TwoTrackMVA_Restricted, Hlt1TwoTrackMVA_Non_Restricted and Inclusive (see here https://gitlab.cern.ch/lhcb/Allen/-/jobs/17508160 or https://gitlab.cern.ch/lhcb/Allen/-/jobs/17446629). Is this second problem coming from !561 (merged) also? Or this comes actually from !685 (merged) ?

I have gone through the pipelines of the MRs between !561 (merged) and !685 (merged) to better understand what might have originated these issues.

The difference in Hlt1TwoTrackMVA_Restricted, Hlt1TwoTrackMVA_Non_Restricted and Inclusive is also observed in:

https://gitlab.cern.ch/lhcb/Allen/-/jobs/16436421 (from MR !666 (merged)).

Which means that !685 (merged) did not introduce this issue. Also please note that !666 (merged) definitely did not introduce the issue either considering the nature of that MR, so it must have existed from before. The hit purity issue is also observed in:

The hit purity issue is observed more times, although this might be because it's seen in the basic pipeline whereas the other is only observed in the full pipeline, which is triggered with less frequency. If we are lucky, these two come from a single issue.

The differences in Hlt1TwoTrackMVA efficiency are huge and I'm having trouble finding pipelines that pass this test, so this looks to me like the complex sequence validation reference files are just incorrect. Maybe they weren't updated as part of the change from a cut based selection to the NN-based selection? Or maybe they were updated, but then were somehow reverted during the confusion around the CI pipeline? @dovombru @nnolte does this sound possible?

It does sound possible, but are they reproducibly wrong all times? Or are there examples where this is not the case? Eg. https://gitlab.cern.ch/lhcb/Allen/-/jobs/17089991 (from MR !672 (merged)) was run after https://gitlab.cern.ch/lhcb/Allen/-/jobs/16436421 (from MR !666 (merged)) and it didn't fail, so I fear this is intermittent unless something has changed since.

i did update the refs at the time, but i remember that some numbers vastly changed after another MR went in, don't quite remember which one that was. (it used to have a higher efficiency than the catboost variant)

I think this somehow didn't get merged. Look at the histories of the default and complex reference files:

https://gitlab.cern.ch/lhcb/Allen/-/commits/master/test/reference/Upgrade_BsPhiPhi_MD_FTv4_DIGI_1k_hlt1_pp_validation_geforcertx2080ti.txt

https://gitlab.cern.ch/lhcb/Allen/-/commits/master/test/reference/Upgrade_BsPhiPhi_MD_FTv4_DIGI_1k_hlt1_complex_validation_geforcertx2080ti.txt

Anyway, I can test this and create a MR with an update. @dcampora the reference file in my local copy of !672 (merged) have numbers that agree with the failing MRs.

The fact that it is intermittent makes me doubtful, but please go ahead and try in a MR, let's see if this still happens.

mentioned in merge request !703 (merged)

it's possible it's due to !561 (merged), but it was not reproducible as there are many pipelines on master after the merge of !561 (merged) which pass, e.g. https://gitlab.cern.ch/lhcb/Allen/-/pipelines/2958106 . I don't mind if you update the description.

In any case, do we agree we should understand this before merging other MRs?

Ideally yes, but in that case we should find someone who can look into this asap, to avoid blocking the MRs for too long.

changed the description

mentioned in merge request !700 (merged)

mentioned in issue #200 (closed)

Hi everyone. This could be explaining an effect I'm seeing for electron lines. I set things up as per:

git clone ssh://git@gitlab.cern.ch:7999/lhcb/Allen.git
cd Allen; git submodule update --init --recursive
git rebase origin/thboettc_no_ip_dielectrons
sed -i 's/m_MinIPChi2 {this, 7.4f};/m_MinIPChi2 {this, 0.0f};/' device/selections/lines/electron/include/DisplacedDielectronLine.cuh # change IPChi2 cut to 0 for an existing dielectron line
mkdir build; cd build; source /cvmfs/sft.cern.ch/lcg/views/setupViews.sh LCG_101 x86_64-centos7-clang12-opt
cmake -DSTANDALONE=ON -DTARGET_DEVICE=CUDA -DCUDA_ARCH=80 -DSEQUENCES=hlt1_pp_ecal -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang-12 ..
make -j128
./Allen --mdf /data/djohnson/HLT1_2021/upgrade_mc_minbias_scifi_v5_000.mdf --sequence=hlt1_pp_ecal

If I repeat the last command multiple times, i see the accept yield change between 2555 and 2557 events.

One possible explanation to the hit purity counters being wrong is how these counters are calculated, which is by re-weighting the previous calculation and adding the new track each time. This could lead to different results when track orders are different. !710 (merged) improves the stability of this calculation.

@thboettc from the first attempts it looks like !710 (merged) may solve the hit purity issue. I just updated the reference files there (this includes an update to the output of the complex sequence). Feel free to take over this MR in case it is useful.

mentioned in merge request !710 (merged)

mentioned in merge request !711 (merged)

mentioned in issue #284 (closed)

closed with merge request !710 (merged)

mentioned in merge request !774 (merged)

Efficiency tests in master failing

Designs

Child items ...

Activity