AthenaPoolCnvSvc+RecExCommon+SimuJobTransforms: Tune and sync AutoFlush settings in various data formats
This MR is a continuation of !37678 (merged). It ended up being a bit more involved than what I originally intended but here we go. In this MR, we're updating a number of ROOT storage settings for various output formats. Things can be summarized as follows:
- We're dropping the large basket sizes for the
POOLContainer(DataHeader)
andPOOLContainer(DataHeaderForm)
- We're explicitly configuring
AutoFlush
settings for our 3 primary trees:CollectionTree
,POOLContainer
, andPOOLContainerForm
- These are 1 (10) for randomly (sequentially) read HITS, 1 for RDO, 10 for ESD, and 100 for AOD files
- We're introducing a helper python module that streamlines the formatting of strings to be passed to
PoolAttributes
The main motivation is the discussion in ATLASSIM-4274. We no longer need large baskets for DataHeaderForm
since we switched to DataHeader_p6
, which gets written only when the data header changes, as opposed to DataHeader_p5
which was written out on every event. So far, we've been using the default ROOT AutoFlush
setting of 30 MB
for everything in the HITS files, and all the POOLContainer
and POOLContainerForm
trees in all the other formats. These are adjusted to match the CollectionTree
and configured for the usage cases.
After these changes, the memory allocation especially in HITtoRDO
, where we have multiple in-memory files, decreases visibly:
For the remaining output formats, I ran RunTier0Tests.py
for 150 events (so that AODs will be flushed to disk at least once) and I haven't observed any surprises. In any case, once we merge we'll see the effect in the SPOT tests.
There are two main things that we don't implement here, which will be done in separate MRs:
- Sync the settings in the new configuration setup
- Sync the settings for the DAODs
I believe we're good to go here. Therefore, I'm un-WIP
ing the MR. Please let me know, many thanks.
cc: @gemmeren, @mnowak, @jchapman
Closes ATLASSIM-4274