Skip to content

Fix for slow finalize in ATLFAST3MT jobs with large numbers of threads

When running ATLFAST3 simulation jobs in AthenaMT with high thread counts, a significant slowdown was observed during finalise. The example below is of a 5000 event 90 thread (63 events in parallel) ATLFAST3 simulation job running on VEGA. Only one CPU is used for approximately 89 minutes at the end of the job: image From inspecting the log file, this corresponds to the finalise stage of the job.
Due to the differing threading models in Athena and Geant4 10.6, the main simulation algorithm (SimKernelMT) has to be cloned per thread, rather than being re-entrant. The slowdown seem to be in the calls to finalise of the private tools owned by SimKernelMT. image The other interesting factor was that the slowdown scales linearly with the number of instances of these private tools to be finalised. image The slowdown is not observed for FullSim jobs, so the cause could be isolated to a tool only used in ATLFAST3 jobs. After some debugging it was isolated to a call to TFile::Close() in ISF::PunchThroughTool::finalize(). Each instance of the PunchThroughTool was opening a file during initialize() and creating a map to histograms in that file. The file was then closed during finalize. Having 63 instances of the same file open seemed to cause the slowdown. Modifying the code to make local copies of the histograms and then close the file at the end of initialize() seems to cure the slowdown.
Time_required_for_TFile_Close_vs_no._remaining_instances Total_finalize_time_vs_number_of_PunchThroughTool_instances

There are of course further improvements to this code that could be made, but this small change is a minimal fix.

Tagging @cyoung, @zhangr, @mbandier, @schaarsc

Edited by John Derek Chapman

Merge request reports