Application hang up in HLTControlFlowMgr
I am running a simple PyConf based task offline on some RICH commissioning data. The task is quite simple, just running the RICH decoding and a few basic monitor algorithms, and thus has high through put.
I am seeing the task sporadically hang up during the event loop. Attaching to the process with gdb I see
Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007fc131504a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) where
#0 0x00007fc131504a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fc1306be140 in __gthread_cond_wait (__mutex=<optimized out>, __cond=0x3e1be288)
at /build/dkonst/gcc-clang-2/build/contrib/gcc-11.1.0/src/gcc-11.1.0-build/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu/bits/gthr-default.h:865
#2 std::__condvar::wait (__m=..., this=0x3e1be288) at /build/dkonst/gcc-clang-2/build/contrib/gcc-11.1.0/src/gcc-11.1.0-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/std_mutex.h:155
#3 std::condition_variable::wait (this=this@entry=0x3e1be288, __lock=...) at /build/dkonst/gcc-clang-2/build/contrib/gcc-11.1.0/src/gcc/11.1.0/libstdc++-v3/src/c++11/condition_variable.cc:41
#4 0x00007fc10c5d0139 in std::condition_variable::wait<HLTControlFlowMgr::createEventContext()::<lambda()> > (__p=..., __lock=..., this=0x3e1be288)
at /cvmfs/lhcb.cern.ch/lib/lcg/releases/gcc/11.1.0-e80bf/x86_64-centos7/include/c++/11.1.0/condition_variable:103
#5 HLTControlFlowMgr::createEventContext (this=this@entry=0x3e1bd000) at ../Hlt/HLTScheduler/src/HLTControlFlowMgr.cpp:583
#6 0x00007fc10c5d3335 in HLTControlFlowMgr::nextEvent (this=0x3e1bd000, maxevt=-1) at ../Hlt/HLTScheduler/src/HLTControlFlowMgr.cpp:844
#7 0x00007fc10c5eaa2c in virtual thunk to HLTControlFlowMgr::executeRun(int) () at ../Hlt/HLTScheduler/src/HLTControlFlowMgr.cpp:84
#8 0x00007fc111018a16 in ApplicationMgr::executeRun (this=0x3da9a000, evtmax=-1) at ../GaudiCoreSvc/src/ApplicationMgr/ApplicationMgr.cpp:794
#9 0x00007fc1116cef1a in Gaudi::Application::run (this=0x3db1ef70) at ../GaudiKernel/src/Lib/Application.cpp:86
#10 0x00007fc1116ceab0 in _py_Gaudi__Application__run (self=<optimized out>) at ../GaudiKernel/src/Lib/Application.cpp:116
#11 0x00007fc12fd72ee6 in ffi_call_unix64 () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/libffi/3.2.1-26487/x86_64-centos7-gcc11-dbg/lib64/libffi.so.6
#12 0x00007fc12fd727a1 in ffi_call () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/libffi/3.2.1-26487/x86_64-centos7-gcc11-dbg/lib64/libffi.so.6
<snip>
So looks something like locking race condition in HLTControlFlowMgr
?
@sponce @raaij @rmatev @graven @ahennequ @nnolte just pinging you as you all authored recent commits. Any thoughts ?