Skip to content

Draft: Use mutexes instead of atomic ops on shared_ptr in CachedParticlePtr

Beojan Stanislaus requested to merge bstanisl/athena:cachedparticleptr-mtx into main

This is a study into using mutexes instead of shared_ptr atomics in CachedParticlePtr to reduce high CPU usage due to a very suboptimal implementation of those atomic operations in libstdc++.

Using TBB spin_rw_mutex means the additional memory required is 8 bytes per CachedParticlePtr (as opposed to 40 bytes with std::mutex). The mutex implementation shows much lower CPU usage than the atomics implementation, though higher than I would expect.

Surprisingly, about 13% of CPU time during MT pileup digitization execute is spent waiting on these mutexes. This is made up almost entirely of about 21.4% of time in the pixel digitization tool and 14.5% of time in the ITk strips digitization tool.

image
image
image

Pinging @jchapman @ssnyder @tadej

Pinging JIRA ticket ATLASSIM-4814

Edited by Beojan Stanislaus

Merge request reports