EventLoop Leak Check, 21.2 branch (2019.01.07.) (!20190) · Merge requests · atlas / athena

Attila Krasznahorkay requested to merge akraszna/athena:EventLoopLeakCheck-21.2-20190107 into 21.2 Jan 07, 2019

This is of course mainly for @krumnack to review...

As we discussed in the last S&C Week in December, we should introduce some basic memory leak detection into EventLoop to make analysers detect coding errors more easily in local test jobs. So that memory leaks would not only be detected once they try running on a large set of files on the grid. (Or on some other batch system.)

For this I simply used TSystem::GetProcInfo to record the amount of resident/virtual memory used by the analysis process after the initialisation/finalisation of the job. And then taught EL::DirectWorker and EL::BatchWorker to each deal with this info in their own way.

As a first thing I've set up the code to treat >10 kB/event leaks in local jobs as errors. Everything else is not an error by default. (So, local leaks smaller than this, and leaks of any size in batch jobs.) Unfortunately I know already that even this generous setting makes a few of our existing unit tests fail. So I'm actually looking for a suggestion: Should we put the code in like this, expecting that it will take a bit until all unit tests are fixed up, or should I disable failures using DirectDriver for now as well by default?

I was also wondering whether to always print the memory leak values at the end of the jobs or not. For now I decided to always print them.

Or course coding suggestions are also appreciated.

Edited Jan 10, 2019 by Nils Erik Krumnack

EventLoop Leak Check, 21.2 branch (2019.01.07.)

Merge request reports