ensure thread termination tools only run in initialized threads
no matter how many threads you configure a tbb thread pool to have, it seems that tbb will create NCORES-1 threads, where NCORES is the number of available cores on the system. (You can artificially limit this by using taskset -c
.) Of these threads, tbb will only activate the number you have requested for the thread pool at any one time. This means that jobs that have thread local initialization need to proceed carefully. We check for thread initialization via the thread local variable Gaudi::Concurrency::ThreadInitDone
which is set during the thread init at the start of the job and as well the first time AlgoExecutionTask::execute()
is called for a newly activated thread. However, sometimes one of the tbb threads is activated after the last event has been processed, meaning that the Gaudi::Concurrency::ThreadInitDone
is false
. This means that IThreadInitTool::terminateThread()
of the ThreadInitTools should NOT be called, as the initialization method has not been called (and it's a waste of resources to call it just to terminate it immediately afterwards).
This MR also removes the forced thread initialization during ThreadPoolSvc::initialize, and lets it happen on demand when the AlgoExecutionTask sees a new thread.
Also uses the new tbb::global_control
class to limit concurrency (which seems to work better in recent versions of TBB than task_scheduler_init
)
Merge request reports
Activity
added 1 commit
- c388c5e7 - ensure thread termination tools only run in initialized threads
- [2019-04-16 08:27] Validation started with lhcb-gaudi-merge#720
- [2019-04-17 00:04] Validation started with lhcb-lcg-dev4#872
- [2019-04-17 00:04] Validation started with lhcb-soa-track#50
- [2019-04-17 00:04] Validation started with lhcb-sanitizers#218
- [2019-04-17 00:07] Validation started with lhcb-dd4hep-95#71
- [2019-04-17 00:07] Validation started with lhcb-lcg-dev3#865
- [2019-04-17 00:09] Validation started with lhcb-tdr-test#517
- [2019-04-17 00:10] Validation started with lhcb-gaudi-head#2223
- [2019-04-18 00:04] Validation started with lhcb-lcg-dev4#873
- [2019-04-18 00:04] Validation started with lhcb-dd4hep-95#72
- [2019-04-18 00:04] Validation started with lhcb-soa-track#51
- [2019-04-18 00:05] Validation started with lhcb-lcg-dev3#866
- [2019-04-18 00:06] Validation started with lhcb-sanitizers#219
- [2019-04-18 00:09] Validation started with lhcb-tdr-test#518
- [2019-04-18 00:11] Validation started with lhcb-gaudi-head#2224
- [2019-04-19 00:03] Validation started with lhcb-lcg-dev3#867
- [2019-04-19 00:04] Validation started with lhcb-sanitizers#220
- [2019-04-19 00:06] Validation started with lhcb-tdr-test#519
- [2019-04-19 00:07] Validation started with lhcb-soa-track#52
- [2019-04-19 00:07] Validation started with lhcb-lcg-dev4#874
- [2019-04-19 00:08] Validation started with lhcb-dd4hep-95#73
- [2019-04-19 00:11] Validation started with lhcb-gaudi-head#2225
- [2019-04-20 00:03] Validation started with lhcb-lcg-dev4#875
- [2019-04-20 00:03] Validation started with lhcb-sanitizers#221
- [2019-04-20 00:03] Validation started with lhcb-soa-track#53
- [2019-04-20 00:04] Validation started with lhcb-dd4hep-95#74
- [2019-04-20 00:04] Validation started with lhcb-lcg-dev3#868
- [2019-04-20 00:06] Validation started with lhcb-tdr-test#520
- [2019-04-20 00:08] Validation started with lhcb-gaudi-head#2226
- [2019-04-21 00:04] Validation started with lhcb-lcg-dev4#876
- [2019-04-21 00:04] Validation started with lhcb-lcg-dev3#869
- [2019-04-21 00:04] Validation started with lhcb-dd4hep-95#75
- [2019-04-21 00:04] Validation started with lhcb-sanitizers#222
- [2019-04-21 00:05] Validation started with lhcb-tdr-test#521
- [2019-04-21 00:05] Validation started with lhcb-gaudi-head#2227
- [2019-04-21 00:08] Validation started with lhcb-soa-track#54
- [2019-04-22 00:03] Validation started with lhcb-lcg-dev4#877
- [2019-04-22 00:03] Validation started with lhcb-lcg-dev3#870
- [2019-04-22 00:04] Validation started with lhcb-soa-track#55
- [2019-04-22 00:04] Validation started with lhcb-sanitizers#223
- [2019-04-22 00:06] Validation started with lhcb-gaudi-head#2228
- [2019-04-22 00:07] Validation started with lhcb-dd4hep-95#76
- [2019-04-22 00:08] Validation started with lhcb-tdr-test#522
- [2019-04-23 00:04] Validation started with lhcb-dd4hep-95#77
- [2019-04-23 00:04] Validation started with lhcb-lcg-dev4#878
- [2019-04-23 00:04] Validation started with lhcb-sanitizers#224
- [2019-04-23 00:06] Validation started with lhcb-gaudi-head#2229
- [2019-04-23 00:06] Validation started with lhcb-lcg-dev3#871
- [2019-04-23 00:06] Validation started with lhcb-tdr-test#523
- [2019-04-23 00:07] Validation started with lhcb-soa-track#56
- [2019-04-23 16:12] Validation started with lhcb-tdr-test#524
Edited by Software for LHCbadded 1 commit
- e2593ee4 - don't launch ThreadInit tasks at initialize, but when needed
I'm not sure if it's this MR, but something changed yesterday in the Gaudi/HEAD builds that is causing the GaudiHive build to segment violate:
(build/GaudiHive)$ /cvmfs/lhcb.cern.ch/lib/var/lib/LbEnv/380/stable/x86_64-slc6/bin/xenv --xml build/config/Gaudi-build.xenv build/bin/genconf.exe -o build/GaudiHive/genConf/GaudiHive -p GaudiHive --configurable-module=GaudiKernel.Proxy --configurable-default-name=Configurable.DefaultName --configurable-algorithm=ConfigurableAlgorithm --configurable-algtool=ConfigurableAlgTool --configurable-auditor=ConfigurableAuditor --configurable-service=ConfigurableService -i GaudiHive *** Break *** segmentation violation =========================================================== There was a crash. This is the entire stack trace of all threads: =========================================================== #0 0x00007f348fc1589e in waitpid () from /lib64/libc.so.6 #1 0x00007f348fba74e9 in do_system () from /lib64/libc.so.6 #2 0x00007f348e3b00aa in TUnixSystem::StackTrace() () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/LCG_95/ROOT/6.16.00/x86_64-slc6-gcc8-opt/lib/libCore.so #3 0x00007f348e3b2984 in TUnixSystem::DispatchSignals(ESignals) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/LCG_95/ROOT/6.16.00/x86_64-slc6-gcc8-opt/lib/libCore.so #4 <signal handler called> #5 0x00007f348f74efd1 in tbb::interface9::global_control::internal_destroy (this=0x706f6f6c20746e65) at ../../src/tbb/tbb_main.cpp:522 #6 0x00007f348cecdb41 in ThreadPoolSvc::~ThreadPoolSvc() () from build/lib/libGaudiHive.so #7 0x00007f348cecdd39 in ThreadPoolSvc::~ThreadPoolSvc() () from build/lib/libGaudiHive.so #8 0x00007f34915c8909 in implements<IService, IProperty, IStateful>::release() () from build/lib/libGaudiKernel.so #9 0x000000000042f1dc in SmartIF<IProperty>::reset(IProperty*) [clone .constprop.1038] () #10 0x0000000000433cc1 in configGenerator::genConfig(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () #11 0x000000000042e5b4 in main ()
See e.g. https://lhcb-nightlies.cern.ch/logs/build/nightly/lhcb-tdr-test/520/x86_64-centos7-gcc8-opt/Gaudi/
Edited by Marco Cattaneoadded 1 commit
- 9d2e3b08 - don't launch ThreadInit tasks at initialize, but when needed
added 1 commit
- e841aade - don't launch ThreadInit tasks at initialize, but when needed
- Resolved by Marco Clemencic
@leggett thanks, that seems to have fixed the problem
assigned to @clemenci
changed milestone to %v32r0
mentioned in issue #65 (closed)
mentioned in commit aa5e184d
9 9 #include "boost/thread.hpp" 10 10 #include "tbb/spin_mutex.h" 11 11 #include "tbb/task_scheduler_init.h" 12 #define TBB_PREVIEW_GLOBAL_CONTROL 1 13 #include "tbb/global_control.h" The TBB doc warns about using Preview features. Not sure we want to build that into our code.
mentioned in merge request atlas/athena!23054 (merged)