Document/Test/Fix the "stability" of find index of minimum impl (!67962) · Merge requests · atlas / athena

Christos Anastopoulos requested to merge ATLAS-EGamma/athena:test_find_minimum into main Dec 20, 2023

Intro

We are using a vectorized implementation . This is quite faster than the STL. Micro-Benchmarking can be found in the github repo mentioned in !67937 (merged). But things can be "really faster".

The issue

There is one issue that in theory if we have the same entry 2 times .
The "STL"/"C" serial will return the 1st index. The Vec implementations as we have it will return the index of any of the 2 depending on how they "fall".

To be clear . This is not due to !67937 (merged). But a pre-existing issue (from the 1st implementation)I figured is still there when I was transfering code from my private repo to ATLAS.

For the !67937 (merged). I left it alone. This is the fix I mentioned to the above MR (but was not part of it)

Plan

The plan is to do commits with the STL C stable vec and a fixed version and establish they change the same exact 1 entry. All of them will do .

Let me ping @nstyles @jcatmore @jchapman @jdandoy . There will be one entry ouput changes that we might need to have. As is much better to have "stable output"

Timings

InputRDOFile="/cvmfs/atlas-nightlies.cern.ch/repo/data/data-art/CampaignInputs/mc20/RDO/mc20_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.recon.AOD.e6337_s3681_r13145/100events.RDO.pool.root"
export ATHENA_CORE_NUMBER=1
Reco_tf.py \
  --CA \
  --inputRDOFile $InputRDOFile \
  --outputAODFile myAOD.pool.root \
  --preInclude 'egammaConfig.egammaOnlyFromRawFlags.egammaOnlyFromRaw' \
  --autoConfiguration 'everything' \
  --maxEvents '100' \
  --perfmon 'fullmonmt'

For context this part uses to cost a significant amount . Currenlty using the optimised implementation is 8% of the total time or lets say 5440 ms out of the ~ 6800 ms it takes for the 100 ttbar

The micro - benchmarking (https://github.com/AnChristos/FindIdxOfMinimum/) gives

Benchmark	Time	CPU	Iterations
findMinimumIndexC/2048.	2687 ns	2678 ns	260374
findMinimumIndexSTL/2048	6111 ns	6093 ns	114311
findMinimumIndexVecBlend/2048	266 ns	265 ns	2634353
findMinimumIndexVecUnordered/2048	111 ns	111 ns	6320537.
findMinimumIndexVecBlend/2048 (stabilized)	296 ns	296 ns	2634353

There are some subtle details as we benchamark "random" . The unordered get a bit of extra boost here and actually is a bit slower than Blend < 256 elements. In production the number of elements decreases as we merge (we start from 2556 but we gradually after 60 iterations we have 78).

So what matter more is the numbers below which are total times for the EMBremCollection Builder (slowest egamma one)

Impl.	Total times for EMBremcollectionBuilder Time 99 ttbar ms	Ratio	Notes
VecBlend (current)	6856.06	1	Will not always return the index of 1st minima if 2 same values
STL	13299.11	1.9	Will return the 1st. Changes 1 entry in the CI
C	9518.07	1.45	Will return the 1st. Changes 1 entry in the CI
VecUnordered	6851.68	1	Will return the 1st. Changes 1 entry in the CI
VBlend stabilized	6855.76	1	Will return the 1st. Changes 1 entry in the CI

Suggestion

I would consider the "instability" a bug in the sense it will not allows us to change impl in case things improve in the STL side in the feature. So I would go with the stabilizing the vecBlend. And also make the test being able to catch potentially this kind of "issue".

Edited Dec 21, 2023 by Christos Anastopoulos

Document/Test/Fix the "stability" of find index of minimum impl

Intro

The issue

Plan

Timings

Suggestion

Merge request reports