Draft: DataObject Parallel Delete, master branch (2021.06.01.)
With many-thread (>32) ATLAS jobs the event loop manager starts using a significant amount of time for just deleting objects from the event/data store, in the main thread of the application. In fact, the event loop manager thread is dominated by the freeing of memory.
This is something that I wrote up in ATEAM-748.
One could think of solving this bottleneck issue in a number of different ways. But I thought this would be the one requiring the least amount of work. DataObject::release()
to push the deletion of individual objects into separate TBB tasks. Which is not a great solution by any means, but it did still improve my profiled ATLAS reconstruction job immensely.
The scheduling of the TBB tasks is still dominating the event loop manager's thread, but it's not pegging it at 100% anymore. Allowing the CPU usage of my test job to go from:
To:
It may not look like much, but the event processing rate of my test job more than doubled with this few-line change. I do recognise though that this update could negatively impact experiments/jobs that don't struggle with how much time it takes to remove their reconstructed events from memory. So let's discuss, whether we want to do anything like this. Or we would rather do something a bit more elaborate specifically in the ATLAS code, just for our experiment. (@ssnyder is working on such a solution at the moment...)
Also pinging @fwinkl, @leggett, @rbielski, @goetz, @bwynne, @christos.