Skip to content
Snippets Groups Projects
Commit fa1ca8ca authored by Tadej Novak's avatar Tadej Novak
Browse files

Merge branch 'fix_issue_38' into 'main'

applying fix to address issue 38

Closes #38

See merge request !102
parents a7969af0 644690ef
No related branches found
No related tags found
1 merge request!102applying fix to address issue 38
Pipeline #11718414 passed with warnings
......@@ -28,6 +28,7 @@ where the value inside the `==` is the `pid` of your program. The rest of the me
If you're using `memcheck`, then you will get lots of messages at the end, relating to memory leaks. The following information (more info at [FAQ](https://valgrind.org/docs/manual/faq.html)) might be useful:
The errors reported can be:
| Message | Explanation |
| --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| definitely lost | your program is leaking memory -- fix it! |
......@@ -79,64 +80,58 @@ In this example, we will step through finding and solving a memory leak.
I have run on opt, and got the following message in the job output:
```
==29199== 300 bytes in 15 blocks are definitely lost in loss record 1006 of 1301
==29199== at 0x3414B6D6: operator new(unsigned) (vg_replace_malloc.c:133)
==29199== by 0x3F79D4F0: Trk::TrackSummaryTool::createSummary(Trk::Track const&) (in /afs/cern.ch/atlas/software/
dist/nightlies/rel/atlrel_1/Tracking/TrkTools/TrkTrackSummaryTool/TrkTrackSummaryTool-00-11-01/i686-slc3-gcc323-
opt/libTrkTrackSummaryToolLib.so)
==29199== by 0x3F7BBDA0: InDet::PriVxTopAlg::m_preselect(Trk::Track const\*) (in /afs/cern.ch/atlas/software/dist
/nightlies/rel/atlrel_1/InnerDetector/InDetRecAlgs/InDetPriVxFinder/InDetPriVxFinder-01-00-01/i686-slc3-gcc323-
opt/libInDetPriVxFinder.so)
==29199== by 0x3F7C0780: InDet::VxPrimary::execute() (in /afs/cern.ch/atlas/software/dist/nightlies/rel/atlrel_1
/InnerDetector/InDetRecAlgs/InDetPriVxFinder/InDetPriVxFinder-01-00-01/i686-slc3-gcc323-opt/libInDetPriVxFinder.so)
==29199== by 0x342850DF: Algorithm::sysExecute() (in /afs/cern.ch/atlas/offline/external/Gaudi/0.14.6.14-root4/
GaudiKernel/v15r7p4/i686-slc3-gcc323-opt/libGaudiKernel.so)
==29199== by 0x341760B1: AthenaEventLoopMgr::executeAlgorithms() (in /afs/cern.ch/atlas/software/dist/nightlies
/rel/atlrel_1/Control/AthenaServices/AthenaServices-01-05-16/i686-slc3-gcc323-opt/libAthenaServices.so)
==29199== by 0x341764D5: AthenaEventLoopMgr::executeEvent(void\*) (in /afs/cern.ch/atlas/software/dist/nightlies
/rel/atlrel_1/Control/AthenaServices/AthenaServices-01-05-16/i686-slc3-gcc323-opt/libAthenaServices.so)
==29199== by 0x34176CC8: AthenaEventLoopMgr::nextEvent(int) (in /afs/cern.ch/atlas/software/dist/nightlies/rel
/atlrel_1/Control/AthenaServices/AthenaServices-01-05-16/i686-slc3-gcc323-opt/libAthenaServices.so)
==3208576==
==3208576== 268,656 (126,816 direct, 141,840 indirect) bytes in 3,963 blocks are definitely lost in loss record 82,643 of 82,898
==3208576== at 0x4848EE1: operator new(unsigned long) (vg_replace_malloc.c:487)
==3208576== by 0x7C24CA5D: CaloTopoClusterMaker::execute(EventContext const&, DataVector<xAOD::CaloCluster_v1, DataVector<xAOD::IParticle, DataModel_detail::NoBase> >*) const (CaloTopoClusterMaker.cxx:659)
==3208576== by 0x7C1FFBB7: CaloClusterMaker::execute(EventContext const&) const (CaloClusterMaker.cxx:164)
==3208576== by 0x3DE924AF: Gaudi::Algorithm::sysExecute(EventContext const&) (in /cvmfs/atlas-nightlies.cern.ch/repo/sw/main--dev4LCG_Athena_x86_64-el9-gcc13-opt/2025-01-29T0815/AthenaExternals/25.0.25/InstallArea/x86_64-el9-gcc13-opt/lib/libGaudiKernel.so)
==3208576== by 0x7A2507EE: AthSequencer::executeAlgorithm(Gaudi::Algorithm*, EventContext const&) const (in /cvmfs/atlas-nightlies.cern.ch/repo/sw/main--dev4LCG_Athena_x86_64-el9-gcc13-opt/2025-01-29T0815/Athena/25.0.25/InstallArea/x86_64-el9-gcc13-opt/lib/libGaudiSequencer.so)
==3208576== by 0x7A250BE5: AthSequencer::execute(EventContext const&) const (in /cvmfs/atlas-nightlies.cern.ch/repo/sw/main--dev4LCG_Athena_x86_64-el9-gcc13-opt/2025-01-29T0815/Athena/25.0.25/InstallArea/x86_64-el9-gcc13-opt/lib/libGaudiSequencer.so)
==3208576== by 0x3DE924AF: Gaudi::Algorithm::sysExecute(EventContext const&) (in /cvmfs/atlas-nightlies.cern.ch/repo/sw/main--dev4LCG_Athena_x86_64-el9-gcc13-opt/2025-01-29T0815/AthenaExternals/25.0.25/InstallArea/x86_64-el9-gcc13-opt/lib/libGaudiKernel.so)
==3208576== by 0x7A2507EE: AthSequencer::executeAlgorithm(Gaudi::Algorithm*, EventContext const&) const (in /cvmfs/atlas-nightlies.cern.ch/repo/sw/main--dev4LCG_Athena_x86_64-el9-gcc13-opt/2025-01-29T0815/Athena/25.0.25/InstallArea/x86_64-el9-gcc13-opt/lib/libGaudiSequencer.so)
==3208576== by 0x7A250BE5: AthSequencer::execute(EventContext const&) const (in /cvmfs/atlas-nightlies.cern.ch/repo/sw/main--dev4LCG_Athena_x86_64-el9-gcc13-opt/2025-01-29T0815/Athena/25.0.25/InstallArea/x86_64-el9-gcc13-opt/lib/libGaudiSequencer.so)
==3208576== by 0x3DE924AF: Gaudi::Algorithm::sysExecute(EventContext const&) (in /cvmfs/atlas-nightlies.cern.ch/repo/sw/main--dev4LCG_Athena_x86_64-el9-gcc13-opt/2025-01-29T0815/AthenaExternals/25.0.25/InstallArea/x86_64-el9-gcc13-opt/lib/libGaudiKernel.so)
==3208576== by 0x7A2507EE: AthSequencer::executeAlgorithm(Gaudi::Algorithm*, EventContext const&) const (in /cvmfs/atlas-nightlies.cern.ch/repo/sw/main--dev4LCG_Athena_x86_64-el9-gcc13-opt/2025-01-29T0815/Athena/25.0.25/InstallArea/x86_64-el9-gcc13-opt/lib/libGaudiSequencer.so)
==3208576== by 0x7A250BE5: AthSequencer::execute(EventContext const&) const (in /cvmfs/atlas-nightlies.cern.ch/repo/sw/main--dev4LCG_Athena_x86_64-el9-gcc13-opt/2025-01-29T0815/Athena/25.0.25/InstallArea/x86_64-el9-gcc13-opt/lib/libGaudiSequencer.so)
==3208576== by 0x3DE924AF: Gaudi::Algorithm::sysExecute(EventContext const&) (in /cvmfs/atlas-nightlies.cern.ch/repo/sw/main--dev4LCG_Athena_x86_64-el9-gcc13-opt/2025-01-29T0815/AthenaExternals/25.0.25/InstallArea/x86_64-el9-gcc13-opt/lib/libGaudiKernel.so)
==3208576== by 0x7A2507EE: AthSequencer::executeAlgorithm(Gaudi::Algorithm*, EventContext const&) const (in /cvmfs/atlas-nightlies.cern.ch/repo/sw/main--dev4LCG_Athena_x86_64-el9-gcc13-opt/2025-01-29T0815/Athena/25.0.25/InstallArea/x86_64-el9-gcc13-opt/lib/libGaudiSequencer.so)
==3208576== by 0x7A250BE5: AthSequencer::execute(EventContext const&) const (in /cvmfs/atlas-nightlies.cern.ch/repo/sw/main--dev4LCG_Athena_x86_64-el9-gcc13-opt/2025-01-29T0815/Athena/25.0.25/InstallArea/x86_64-el9-gcc13-opt/lib/libGaudiSequencer.so)
==3208576== by 0x3DE924AF: Gaudi::Algorithm::sysExecute(EventContext const&) (in /cvmfs/atlas-nightlies.cern.ch/repo/sw/main--dev4LCG_Athena_x86_64-el9-gcc13-opt/2025-01-29T0815/AthenaExternals/25.0.25/InstallArea/x86_64-el9-gcc13-opt/lib/libGaudiKernel.so)
==3208576== by 0x40D7D5BE: AthenaEventLoopMgr::executeAlgorithms(EventContext const&) (in /cvmfs/atlas-nightlies.cern.ch/repo/sw/main--dev4LCG_Athena_x86_64-el9-gcc13-opt/2025-01-29T0815/Athena/25.0.25/InstallArea/x86_64-el9-gcc13-opt/lib/libAthenaServices.so)
==3208576== by 0x40D85406: AthenaEventLoopMgr::executeEvent(EventContext&&) (in /cvmfs/atlas-nightlies.cern.ch/repo/sw/main--dev4LCG_Athena_x86_64-el9-gcc13-opt/2025-01-29T0815/Athena/25.0.25/InstallArea/x86_64-el9-gcc13-opt/lib/libAthenaServices.so)
==3208576== by 0x40D83151: AthenaEventLoopMgr::nextEvent(int) (in /cvmfs/atlas-nightlies.cern.ch/repo/sw/main--dev4LCG_Athena_x86_64-el9-gcc13-opt/2025-01-29T0815/Athena/25.0.25/InstallArea/x86_64-el9-gcc13-opt/lib/libAthenaServices.so)
==3208576== by 0x40D7E7DC: AthenaEventLoopMgr::executeRun(int) (in /cvmfs/atlas-nightlies.cern.ch/repo/sw/main--dev4LCG_Athena_x86_64-el9-gcc13-opt/2025-01-29T0815/Athena/25.0.25/InstallArea/x86_64-el9-gcc13-opt/lib/libAthenaServices.so)
==3208576== by 0x40820304: ApplicationMgr::executeRun(int) (in /cvmfs/atlas-nightlies.cern.ch/repo/sw/main--dev4LCG_Athena_x86_64-el9-gcc13-opt/2025-01-29T0815/AthenaExternals/25.0.25/InstallArea/x86_64-el9-gcc13-opt/lib/libGaudiCoreSvc.so)
==3208576== by 0x3DEC4CF5: py_bootstrap_app_run (in /cvmfs/atlas-nightlies.cern.ch/repo/sw/main--dev4LCG_Athena_x86_64-el9-gcc13-opt/2025-01-29T0815/AthenaExternals/25.0.25/InstallArea/x86_64-el9-gcc13-opt/lib/libGaudiKernel.so)
==3208576== by 0x105B7051: ffi_call_unix64 (in /cvmfs/sft.cern.ch/lcg/releases/libffi/3.4.2-a3607/x86_64-el9-gcc13-opt/lib64/libffi.so.8.1.0)
==3208576== by 0x105B5C78: ffi_call_int (in /cvmfs/sft.cern.ch/lcg/releases/libffi/3.4.2-a3607/x86_64-el9-gcc13-opt/lib64/libffi.so.8.1.0)
==3208576== by 0x1059E0B0: _call_function_pointer (callproc.c:923)
==3208576== by 0x1059E0B0: _ctypes_callproc (callproc.c:1262)
==3208576== by 0x10598D76: PyCFuncPtr_call (_ctypes.c:4201)
==3208576== by 0x49B855C: _PyObject_MakeTpCall (call.c:214)
==3208576== by 0x495D86E: _PyEval_EvalFrameDefault (ceval.c:4769)
==3208576== by 0x4AAF753: _PyEval_EvalFrame (pycore_ceval.h:73)
==3208576== by 0x4AAF753: _PyEval_Vector (ceval.c:6434)
==3208576== by 0x4AAF753: PyEval_EvalCode (ceval.c:1148)
==3208576== by 0x4AF7DF8: run_eval_code_obj (pythonrun.c:1741)
==3208576== by 0x4AF7DF8: run_mod (pythonrun.c:1762)
```
This tells us that there is a 300 byte leak coming from `Trk::TrackSummaryTool::createSummary(Trk::Track const&)`, which was called by `InDet::PriVxTopAlg::m_preselect(Trk::Track const\*)` etc. (i.e. the last/latest call is at the top, and first at the bottom).
It might be useful to have some more detailed output - such as line numbers, for that last method. So we can check out and build just that package with debug info (see [here](https://twiki.cern.ch/twiki/bin/view/AtlasComputing/UsingDebugBuiltPackagesWithOptBuild#Rebuild_the_packages) for details on how to do this).
Now re-run ... the new output is:
This indicates a 268,656-byte memory leak originating from the function `CaloTopoClusterMaker::execute(EventContext const&, DataVector<xAOD::CaloCluster_v1, ...>*)`, which was called by `Gaudi::Algorithm::sysExecute(EventContext const&)`, and so on.
Note that the most recent call appears at the top, and the earliest at the bottom of the trace.
The default Athena build includes debug symbols, so the Valgrind output contains source line references (e.g., `CaloClusterMaker.cxx:164`).
In the example, a memory leak is caused by a `CaloProtoCluster` allocation introduced in `CaloTopoClusterMaker` but never freed:
```
CaloProtoCluster* myCluster = new CaloProtoCluster(cellCollLink);
```
==8807== 20 bytes in 1 blocks are definitely lost in loss record 242 of 2341
==8807== at 0x3414C6D6: operator new(unsigned) (vg_replace_malloc.c:133)
==8807== by 0x3EB1894B: Trk::TrackSummaryTool::createSummary(Trk::Track const&) (TrackSummaryTool.cxx:171)
==8807== by 0x3F91A769: Trk::TrackScoringTool::score(Trk::Track const&) (in /afs/cern.ch/user/e/emoyse/
public/Development/Tracking/TrkTools/TrkAmbiguityProcessor/TrkAmbiguityProcessor-00-00-01/i686-slc3-gcc323-opt/
libTrkAmbiguityProcessorLib.so)
==8807== by 0x3F918E3F: Trk::TrackAmbiguityProcessorTool::process(DataVector<Trk::Track> const\*) (in
/afs/cern.ch/user/e/emoyse/public/Development/Tracking/TrkTools/TrkAmbiguityProcessor/
TrkAmbiguityProcessor-00-00-01/i686-slc3-gcc323-opt/libTrkAmbiguityProcessorLib.so)
==8807== by 0x3F92CF0E: InDetAmbiguitySolver::resolveTracks() (in /afs/cern.ch/user/e/emoyse/
public/Development/InnerDetector/InDetRecAlgs/InDetAmbiguitySolver/InDetAmbiguitySolver-00-08-00/
i686-slc3-gcc323-opt/libInDetAmbiguitySolver.so)
==8807== by 0x3F92CA48: InDetAmbiguitySolver::execute() (in /afs/cern.ch/user/e/emoyse/public/
Development/InnerDetector/InDetRecAlgs/InDetAmbiguitySolver/InDetAmbiguitySolver-00-08-00/
i686-slc3-gcc323-opt/libInDetAmbiguitySolver.so)
==8807== by 0x342860DF: Algorithm::sysExecute() (in /afs/cern.ch/atlas/offline/external/Gaudi/
0.14.6.14-root4/GaudiKernel/v15r7p4/i686-slc3-gcc323-opt/libGaudiKernel.so)
==8807== by 0x341770B1: AthenaEventLoopMgr::executeAlgorithms() (in /afs/cern.ch/atlas/
software/dist/nightlies/rel/atlrel_1/Control/AthenaServices/AthenaServices-01-05-16/i686-slc3-gcc323-opt/
libAthenaServices.so)
Following the C++11 coding guidelines that recommend avoiding raw pointers, this line should be replaced with:
```
So now we see that there is a problem in `TrackSummaryTool.cxx`, line 171... which narrows it down to here:
```
const std::vector< const Trk::TrackParameters* >* extrapParameters = 0;
if (m_doHoles)
{
extrapParameters = extrapolate(track);
[...]
}
auto myCluster = std::make_unique<CaloProtoCluster>(cellCollLink);
```
The answer is now clear -`extrapParameters` was never deleted. Fixing this and checking again with valgrind confirms that the bug is gone.
After applying the fix and re-running the check with Valgrind, the memory leak is confirmed to be resolved.
### Special issue with object factory design in Tracking
Especially in the tracking realm, many tools employ a cascaded object factory design pattern. That means they create an object and instead of deleting it before the method reaches the end, they give it back to the calling client, hereby passing the ownership (and responsibility to delete it or add it to StoreGate) onto the calling client, and even further on (cascade). The TrackSummaryTool in the example above is such a factory, and if it shows up in a valgrind leak report it needs some investigation to find out if the tool is 'leaking' objects internally or if the tool is working correctly but the calling client failed to take over the ownership. This investigation is complicated by the fact that often the factory tools have an internal structure with such ownership transfer, and the clients can themselves again employ a factory design to add the (potentially leaking) object onto the next bigger object. For instance, a track summary is first made (by factory, the TrackSummaryTool) and then added to a track, but the track is leaked. In such a case the developer of the TrackSummaryTool will not enjoy getting a bug report filed, for which the tool is not responsible.
......
......@@ -27,7 +27,7 @@ There are other possibilities, such as
- `Addrcheck` - a lightweight (faster) version of `memcheck`
- `Cachegrind` - is a cache profiler
- `Callgrind` - a call graph profiler (extended version of `Cachegrind`)
* You can use this to profile only your algorithm, using [Valkyrie](https://atlas-sw-doxygen.web.cern.ch/atlas-sw-doxygen/atlas_22.0.X-DOX/docs//html/da/dcc/Valkyrie_page.html)
* You can use this to profile only your algorithm, using [Valkyrie](https://atlas-sw-doxygen.web.cern.ch/atlas-sw-doxygen/atlas_main--Doxygen/docs/html/da/dcc/Valkyrie_page.html)
- `Massif` - a heap memory profiler
- `Helgrind` - data races in multithreaded programs
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment