Segfault in RelationsClonerAlg when running Moore test
When running the Moore test hlt2_thor_selections
over 1k events, a segmentation violation occurs:
#0 0x00007f9299ffd549 in waitpid () from /lib64/libc.so.6
#1 0x00007f9299f7af62 in do_system () from /lib64/libc.so.6
#2 0x00007f9299f7b311 in system () from /lib64/libc.so.6
#3 0x00007f92904c400c in TUnixSystem::StackTrace() () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/v6.24.00-a725e/x86_64-centos7-gcc10-opt/lib/libCore.so
#4 0x00007f9290906cd2 in (anonymous namespace)::TExceptionHandlerImp::HandleException(int) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/v6.24.00-a725e/x86_64-centos7-gcc10-opt/lib/libcppyy_backend3_8.so
#5 0x00007f92904c1491 in TUnixSystem::DispatchSignals(ESignals) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/v6.24.00-a725e/x86_64-centos7-gcc10-opt/lib/libCore.so
#6 <signal handler called>
#7 0x00007f925e71331c in MicroDST::RelationsClonerAlg<LHCb::Relation1D<LHCb::Particle, LHCb::VertexBase> >::cloneTo(LHCb::VertexBase*) () from /home/lmeyerga/stack-220721-opt/Phys/InstallArea/x86_64_v2-centos7-gcc10-opt/lib/libMicroDSTAlgorithm.so
#8 0x00007f925e722446 in MicroDST::TableCloner<LHCb::Relation1D<LHCb::Particle, LHCb::VertexBase> >::operator()(LHCb::Relation1D<LHCb::Particle, LHCb::VertexBase> const*) () from /home/lmeyerga/stack-220721-opt/Phys/InstallArea/x86_64_v2-centos7-gcc10-opt/lib/libMicroDSTAlgorithm.so
#9 0x00007f925e71e841 in CopyParticle2PVRelationsFromLinePersistenceLocations::execute() () from /home/lmeyerga/stack-220721-opt/Phys/InstallArea/x86_64_v2-centos7-gcc10-opt/lib/libMicroDSTAlgorithm.so
#10 0x00007f9278396cb0 in Gaudi::Algorithm::sysExecute(EventContext const&) () from /home/lmeyerga/stack-220721-opt/Gaudi/InstallArea/x86_64_v2-centos7-gcc10-opt/lib/libGaudiKernel.so
#11 0x00007f9274ef6690 in GaudiAlgorithm::sysExecute(EventContext const&) () from /home/lmeyerga/stack-220721-opt/Gaudi/InstallArea/x86_64_v2-centos7-gcc10-opt/lib/libGaudiAlgLib.so
#12 0x00007f9270ccb4cf in AlgWrapper::execute(EventContext&, gsl::span<AlgState, 18446744073709551615ul>) const () from /home/lmeyerga/stack-220721-opt/LHCb/InstallArea/x86_64_v2-centos7-gcc10-opt/lib/libHLTScheduler.so
#13 0x00007f9270cbdd1b in HLTControlFlowMgr::push(EventContext&&)::{lambda(EventContext&)#1}::operator()(EventContext&) const () from /home/lmeyerga/stack-220721-opt/LHCb/InstallArea/x86_64_v2-centos7-gcc10-opt/lib/libHLTScheduler.so
#14 0x00007f9270cbe2b1 in tbb::internal::function_task<(anonymous namespace)::EventTask<HLTControlFlowMgr::push(EventContext&&)::{lambda(EventContext&)#1}> >::execute() () from /home/lmeyerga/stack-220721-opt/LHCb/InstallArea/x86_64_v2-centos7-gcc10-opt/lib/libHLTScheduler.so
#15 0x00007f9277b8d915 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::process_bypass_loop (this=this
entry=0x7f9274be7e00, context_guard=..., t=0x7f9274befc40, isolation=isolation
entry=0) at ../../src/tbb/custom_scheduler.h:474
#16 0x00007f9277b8dc3b in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0x7f9274be7e00, parent=..., child=<optimized out>) at ../../src/tbb/custom_scheduler.h:636
#17 0x00007f9277b878f7 in tbb::internal::arena::process (this=0x7f9274bf7a00, s=...) at ../../src/tbb/arena.cpp:196
#18 0x00007f9277b86090 in tbb::internal::market::process (this=0x7f9274bff580, j=...) at ../../src/tbb/market.cpp:667
#19 0x00007f9277b827dc in tbb::internal::rml::private_worker::run (this=0x7f926f17a180) at ../../src/tbb/private_server.cpp:266
#20 0x00007f9277b82a19 in tbb::internal::rml::private_worker::thread_routine (arg=<optimized out>) at ../../src/tbb/private_server.cpp:219
#21 0x00007f929aa16ea5 in start_thread () from /lib64/libpthread.so.0
#22 0x00007f929a0369fd in clone () from /lib64/libc.so.6
By default the test runs over only 10 events and this isn't triggered. It is caused by a null pointer being dereferenced in RelationsClonerAlg::cloneTo()
.
A check is included in !968 (merged), which turns the segfault into an error message. Oddly, the error only happens once.
EDIT: Since !968 (merged) was merged, an error occurs instead of a segfault, but the underlying issue persists. Running
./Moore/run gaudirun.py Moore/Hlt/Moore/tests/options/default_input_and_conds_hlt2.py Moore/Hlt/Hlt2Conf/tests/options/hlt2_thor_selections.py
with 1k events, I see
IODataManager INFO Referring to dataset hlt2_thor_selections.dst by its file ID:E9E80A72-514E-11EC-B595-B42E99AC97BC
HLTControlFlowMgr INFO Timing started at: 21:02:48
CopyParticle2PVRelationsFromLine... ERROR To is nullptr. Cannot clone!
HLTControlFlowMgr INFO Timing stopped at: 21:26:23
HLTControlFlowMgr INFO ---> Loop over 1000 Events Finished - WSS 3422.07, timed 800 Events: 1414328 ms, Evts/s = 0.56564
EDIT 2: Running with
options.first_evt = 800
options.evt_max = 50
is enough to trigger the error. A full log is included (phys24.log).
Edited by Lucas Meyer Garcia