Segfaults upon stopping due to multiple calls to exit()
When running the XDAQ15 pre-alpha, several XDAQ processes (among which the XaaD) crash semi-reliably upon stopping. The typically signature is a core dump due to signal 6 or 11, with a stack trace like the one below. These stack traces are not always from log4cplus (although this one is quite common), but they all have in common that they are triggered from inside one of the signal handler/exit callbacks from toolbox::Runtime.
{{{ Core was generated by `/opt/xdaq/bin/xdaq.exe -h cmstcdslab.cern.ch -p 9950 -u file.append:/var/log/tc'. Program terminated with signal 6, Aborted.
Thread 1 (Thread 0x7f9dc57d5700 (LWP 14445)): #0 0x00007f9dc9a9c1f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 (closed) 0x00007f9dc9a9da28 in __GI_abort () at abort.c:119 #2 (closed) 0x00007f9dca3a2ac5 in __gnu_cxx::__verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95 #3 0x00007f9dcc315162 in terminate_mainhandler () at /usr/src/debug/daq-toolbox-9.6.0/src/common/Runtime.cc:171 #4 (closed) 0x00007f9dca3a0a36 in __cxxabiv1::__terminate (handler=) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:38 #5 (closed) 0x00007f9dca39f9e9 in __cxa_call_terminate (ue_header=0x7f9d20000920) at ../../../../libstdc++-v3/libsupc++/eh_call.cc:54 #6 (closed) 0x00007f9dca3a0654 in __cxxabiv1::__gxx_personality_v0 (version=, actions=, exception_class=, ue_header=, context=) at ../../../../libstdc++-v3/libsupc++/eh_personality.cc:676 #7 (closed) 0x00007f9dc9e39903 in _Unwind_RaiseException_Phase2 (exc=exc@entry=0x7f9d20000920, context=context@entry=0x7f9dc57d3f80) at ../../../libgcc/unwind.inc:62 #8 (closed) 0x00007f9dc9e39c9b in _Unwind_RaiseException (exc=0x7f9d20000920) at ../../../libgcc/unwind.inc:131 #9 (closed) 0x00007f9dca3a0c76 in __cxxabiv1::__cxa_throw (obj=0x7f9d20000940, tinfo=0x7f9dca6319a0 , dest=0x7f9dca3f6f20 std::system_error::~system_error()) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:82 #10 (closed) 0x00007f9dca3f5eb0 in std::__throw_system_error (__i=35) at ../../../../../libstdc++-v3/src/c++11/functexcept.cc:104 #11 (closed) 0x00007f9dca3f7068 in std::thread::join (this=this@entry=0x675bc0) at ../../../../../libstdc++-v3/src/c++11/thread.cc:110 #12 (closed) 0x00007f9dcc07a6bc in ~ThreadPool (this=0x674170, __in_chrg=) at threadpool/ThreadPool.h:185 #13 (closed) operator() (this=, __ptr=0x674170) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/unique_ptr.h:78 #14 ~unique_ptr (this=0x674018, __in_chrg=) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/unique_ptr.h:268 #15 (closed) ~DefaultContext (this=0x673d20, __in_chrg=) at src/global-init.cxx:131 #16 (closed) log4cplus::(anonymous namespace)::destroy_default_context::~destroy_default_context (this=, __in_chrg=) at src/global-init.cxx:166 #17 (closed) 0x00007f9dc9a9fdda in __cxa_finalize (d=0x7f9dcc2bcda8) at cxa_finalize.c:55 #18 (closed) 0x00007f9dcc05c273 in __do_global_dtors_aux () from /opt/xdaq/lib/liblog4cplus-2.0.so.3 #19 0x00007f9dc57d4660 in ?? () #20 0x00007f9dcd2adb3a in _dl_fini () at dl-fini.c:253 Backtrace stopped: frame did not save the PC }}}
It appears this is due to multiple TERM signals arriving at the same process. There is no protection against this, which leads to multiple calls to exit(). The exit() method in question is the C version (but the C++ version would probably do the same thing) from stdlib.h. Calling the exit() method more than once leads to undefined behaviour, according to the standard.
The attached patch (against the development trunk) implements a very simple protection against multiple execution of the signal handler callbacks by adding a global flag indicating that an exit procedure is already in progress. In the B14 TCDS lab the crashes upon stopping mentioned above are solved by this patch.
NOTE: The reason this was never seen in XDAQ14 (or earlier) and very reproducible in XDAQ15 is probably due to the fact that in XDAQ15 (with GCC 7.2) the XaaS was modified to have systemd start the XDAQ processes in a devtoolset-7 enabled environment. The scl command used to do this spawns a separate process for the XDAQ executable, and upon stopping it appears that the XDAQ process receives SIGTERM both directly and via the surrounding process. The same crashes can be reproduced by sending SIGTERM directly to the XDAQ process several times in quick succession.