Skip to content

Document/Test/Fix the "stability" of find index of minimum impl

Intro

We are using a vectorized implementation . This is quite faster than the STL. Micro-Benchmarking can be found in the github repo mentioned in !67937 (merged). But things can be "really faster".

The issue

There is one issue that in theory if we have the same entry 2 times .
The "STL"/"C" serial will return the 1st index. The Vec implementations as we have it will return the index of any of the 2 depending on how they "fall".

To be clear . This is not due to !67937 (merged). But a pre-existing issue (from the 1st implementation)I figured is still there when I was transfering code from my private repo to ATLAS.

For the !67937 (merged). I left it alone. This is the fix I mentioned to the above MR (but was not part of it)

Plan

The plan is to do commits with the STL C stable vec and a fixed version and establish they change the same exact 1 entry. All of them will do .

Let me ping @nstyles @jcatmore @jchapman @jdandoy . There will be one entry ouput changes that we might need to have. As is much better to have "stable output"

Timings

InputRDOFile="/cvmfs/atlas-nightlies.cern.ch/repo/data/data-art/CampaignInputs/mc20/RDO/mc20_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.recon.AOD.e6337_s3681_r13145/100events.RDO.pool.root"
export ATHENA_CORE_NUMBER=1
Reco_tf.py \
  --CA \
  --inputRDOFile $InputRDOFile \
  --outputAODFile myAOD.pool.root \
  --preInclude 'egammaConfig.egammaOnlyFromRawFlags.egammaOnlyFromRaw' \
  --autoConfiguration 'everything' \
  --maxEvents '100' \
  --perfmon 'fullmonmt' 

For context this part uses to cost a significant amount . Currenlty using the optimised implementation is 8% of the total time or lets say 5440 ms out of the ~ 6800 ms it takes for the 100 ttbar

The micro - benchmarking (https://github.com/AnChristos/FindIdxOfMinimum/) gives

Benchmark Time CPU Iterations
findMinimumIndexC/2048. 2687 ns 2678 ns 260374
findMinimumIndexSTL/2048 6111 ns 6093 ns 114311
findMinimumIndexVecBlend/2048 266 ns 265 ns 2634353
findMinimumIndexVecUnordered/2048 111 ns 111 ns 6320537.
findMinimumIndexVecBlend/2048 (stabilized) 296 ns 296 ns 2634353

There are some subtle details as we benchamark "random" . The unordered get a bit of extra boost here and actually is a bit slower than Blend < 256 elements. In production the number of elements decreases as we merge (we start from 2556 but we gradually after 60 iterations we have 78).

So what matter more is the numbers below which are total times for the EMBremCollection Builder (slowest egamma one)

Impl. Total times for EMBremcollectionBuilder Time 99 ttbar ms Ratio Notes
VecBlend (current) 6856.06 1 Will not always return the index of 1st minima if 2 same values
STL 13299.11 1.9 Will return the 1st. Changes 1 entry in the CI
C 9518.07 1.45 Will return the 1st. Changes 1 entry in the CI
VecUnordered 6851.68 1 Will return the 1st. Changes 1 entry in the CI
VBlend stabilized 6855.76 1 Will return the 1st. Changes 1 entry in the CI

Suggestion

I would consider the "instability" a bug in the sense it will not allows us to change impl in case things improve in the STL side in the feature. So I would go with the stabilizing the vecBlend. And also make the test being able to catch potentially this kind of "issue".

Edited by Christos Anastopoulos

Merge request reports