LoKi::VertexFitter and GeometryInfoPlus are not thread safe (was FunctionalDiElectronMaker seg-faults running multithreaded)
A throughput test crashed after throwing this error:
FunctionalDiElectronMaker#12.LoK... ERROR LoKi::VertexFitter:: Input set could not be vertexed StatusCode=702
I was able to reproduce this locally with this script: run_moore_mdf.py and these commands (from the stack dir)
xrdcp root://eoslhcb.cern.ch//eos/lhcb/wg/rta/WP2/Hlt2Throughput/minbias_filtered_gec11000_1.mdf .
cd Moore; git remote update -p; git checkout charm-thor-hlt2-lines; git reset --hard origin/charm-thor-hlt2-lines; git rebase origin/master; cd ..; make fast/Moore/purge; make fast/Moore; Moore/run gaudirun.py run_moore_mdf.py
which crashes with
ToolSvc.DaVinci::ParticleTransporter ERROR DaVinci::ParticleTransporter:: Invalid particle, impossible to transport StatusCode=FAILURE
*** Break *** segmentation violation
...
#7 0x00007f5854d1c6bb in LoKi::KalmanFilter::transport(LoKi::KalmanFilter::Entry&, double, IParticleTransporter*, IGeometryInfo const&) () from /data/home/mstahl/stack/Rec/InstallArea/x86_64_v2-centos7-gcc11-opt/lib/libKalmanFilter.so
#8 0x00007f5854f942c5 in LoKi::VertexFitter::_iterate_opt(unsigned long, IGeometryInfo const&) const () from /data/home/mstahl/stack/Rec/InstallArea/x86_64_v2-centos7-gcc11-opt/lib/libLoKiFitters.so
#9 0x00007f5854f96352 in LoKi::VertexFitter::fit(LHCb::Vertex&, std::vector<LHCb::Particle const*, std::allocator<LHCb::Particle const*> > const&, IGeometryInfo const&) const () from /data/home/mstahl/stack/Rec/InstallArea/x86_64_v2-centos7-gcc11-opt/lib/libLoKiFitters.so
#10 0x00007f5854f88a05 in LoKi::VertexFitter::fit(std::vector<LHCb::Particle const*, std::allocator<LHCb::Particle const*> > const&, LHCb::Vertex&, LHCb::Particle&, IGeometryInfo const&) const () from /data/home/mstahl/stack/Rec/InstallArea/x86_64_v2-centos7-gcc11-opt/lib/libLoKiFitters.so
#11 0x00007f585762afae in FunctionalDiElectronMaker::operator()( ...
The line in the options file is not the one that will trigger the segfault. It seems to happen sporadically for lines that use FunctionalDiElectronMaker
. Note that the seg-fault only happens when running with
options.n_threads = 3
options.n_event_slots = 3
or larger (2,2 is still ok). I'm running the job on x86_64_v2-centos7-gcc11-dbg+tsan
since yesterday and will attach the log later.
@cmarinbe can you please take a look?
Edited by Ryunosuke O'Neil