Memory leak with dd4hep condition updates
Whilst working on the RICH refractive index calibration task, where I need to test multiple run changes, I observed what appeared to be a sizeable memory leak when running a single task over many runs, and thus with multiple run changes and condition updates.
I've since investigated and it appears to be nothing to do with the calibration task, but it appears (to me) to be a leak somewhere in the dd4hep condition updates.
To show this I have pushed some minor updates to !3928 (merged) to illustrate the issue using a simple task.
If you use the branch there, and then run by hand the test
gaudirun.py /path/to/LHCb/Rich/RichDetectors/tests/qmtest/test-decode-and-spacepoints-2022-data.qmt
and then you edit the options file
and flip the line
useOneRun = True
between True
and False
, you will switch the update data the test uses from running over a single run, to a special set of MDF files I prepared for the ref. index calibration testing where I have 10k events from a large number of runs. The idea is using these file the task then sees a number of run changes and thus condition updates.
Here are the logs for two runs, using either a single run or multiple runs. Otherwise, the tasks are identically configured.
Both are running over 1M events. Both have the MemoryAuditor enabled.
For the task running over a single run, the memory usage hits 830MB after the first condition update, and pretty much stays there for the entire job.
For the task running over many runs, you can see the memory usage jumps significantly after each condition update. By the end of the task it is using 6.3GB....
I've looked into things from the RICH side, the derived condition objects I use, and I do not see what I could be doing there that would directly cause this. So my suspicion (unproven as of yet) is the issue is more on the framework side.
@clemenci @bcouturi What exactly happens on condition updates ? Are the previous condition objects kept forever, or eventually purged ?