Skip to content
Snippets Groups Projects

TrackMasterFItter: Add mutex locks to operator()

Closed John Leslie Cobbledick requested to merge jcobbled-alignment-locks into master
3 unresolved threads

Temporary work around to ensure TrackMasterFitter is thread safe by adding lock in operator(). Thread-safety issues noticed when running alignment where it crashes randomly (although rarely). These issues also detected by helgrind

Edited by John Leslie Cobbledick

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
25 25 #include "TrackInterfaces/ITrackFitter.h"
26 26 #include "TrackInterfaces/ITrackKalmanFilter.h"
27 27
28 // C++ stuff
  • Sebastien Ponce approved this merge request

    approved this merge request

    • Can you please also add additional information on what the actual problem/symptom is which this 'sledge hammer' approach would solve? As it is clear that this is going to be a major bottleneck, and it would be better if any synchronization could be pushed down to a more fine-grained level -- so there should be an accompanying issue that documents what needs to happen so that these locks can be removed again.

      (and if the problem is that hellgrind complains about non-threadsafe RK::ErrorCounters, then it isn't really an observable problem)

    • @jcobbled I'd also prefer if you could provide more info as to what actually crashes and maybe even a stack trace.

      Please note, that up to the beginning of this month we were still running the hlt2_reco_baseline configuration in the throughput tests. See for example this result.
      This config runs the TrackMasterFitter, thus I'm skeptical that the Alignment sees a problem that we didn't see for years in the throughput tests.

      I guess it's more a problem of misconfiguration or a threading issue in a different component.

    • I suspect (as mentioned above) that hellgrind is complaining about data races involving these mutable data members -- none of which impact the results, just some (not?) reported statistics which no-one actually looks at which may be slightly wrong... Hence I do not think that these locks actually do anything useful other than silencing hellgrind.

    • I think this MR can be dropped for now. I will do some more tests with alignment without this lock to see if I can produce an error caused by TrackMasterFit. Helgrind reported data races such as the below which I guess are not too important.

      ==32663== Possible data race during read of size 8 at 0xD2AD6210 by thread #18
      ==32663== Locks held: none
      ==32663==    at 0xE8C53A55: RK::ErrorCode TrackRungeKuttaExtrapolator::extrapolate<RK::Scheme::ButcherTableau<6u, false>, TrackRungeKuttaExtrapolator::doNothing>(RK::Scheme::ButcherTableau<6u, false> const&, RK::details::State<double>&, double, RK::details::Jacobian<double>*, TrackRungeKuttaExtrapolator::doNothing) const (in /hepgpu10-data1/johncob/alignment/upgrade-hackathon-setup/Rec/InstallArea/x86_64_v2-centos7-gcc11-dbg/lib/libTrackExtrapolators.so)
      ==32663==    by 0xE8C370F6: auto RK::Scheme::invoke_with_tableau<TrackRungeKuttaExtrapolator::propagate(RK::details::State<double>&, double, ROOT::Math::SMatrix<double, 5u, 5u, ROOT::Math::MatRepStd<double, 5u, 5u> >*, IGeometryInfo const&, LHCb::Tr::PID) const::{lambda(auto:1 const&)#1}>(TrackRungeKuttaExtrapolator::propagate(RK::details::State<double>&, double, ROOT::Math::SMatrix<double, 5u, 5u, ROOT::Math::MatRepStd<double, 5u, 5u> >*, IGeometryInfo const&, LHCb::Tr::PID) const::{lambda(auto:1 const&)#1}, RK::Scheme::scheme_t) (in /hepgpu10-data1/johncob/alignment/upgrade-hackathon-setup/Rec/InstallArea/x86_64_v2-centos7-gcc11-dbg/lib/libTrackExtrapolators.so)
      ==32663==    by 0xE8C37892: TrackRungeKuttaExtrapolator::propagate(RK::details::State<double>&, double, ROOT::Math::SMatrix<double, 5u, 5u, ROOT::Math::MatRepStd<double, 5u, 5u> >*, IGeometryInfo const&, LHCb::Tr::PID) const (in /hepgpu10-data1/johncob/alignment/upgrade-hackathon-setup/Rec/InstallArea/x86_64_v2-centos7-gcc11-dbg/lib/libTrackExtrapolators.so)
      ==32663==    by 0xE8C37E02: TrackRungeKuttaExtrapolator::propagate(ROOT::Math::SVector<double, 5u>&, double, double, ROOT::Math::SMatrix<double, 5u, 5u, ROOT::Math::MatRepStd<double, 5u, 5u> >*, IGeometryInfo const&, LHCb::Tr::PID) const (in /hepgpu10-data1/johncob/alignment/upgrade-hackathon-setup/Rec/InstallArea/x86_64_v2-centos7-gcc11-dbg/lib/libTrackExtrapolators.so)
      ==32663==    by 0xE8C16F45: TrackMasterExtrapolator::propagate(ROOT::Math::SVector<double, 5u>&, double, double, ROOT::Math::SMatrix<double, 5u, 5u, ROOT::Math::MatRepStd<double, 5u, 5u> >*, IGeometryInfo const&, LHCb::Tr::PID) const (in /hepgpu10-data1/johncob/alignment/upgrade-hackathon-setup/Rec/InstallArea/x86_64_v2-centos7-gcc11-dbg/lib/libTrackExtrapolators.so)
      ==32663==    by 0xE98518F4: TrackMasterFitter::initializeRefStates(LHCb::Event::v1::Track&, IGeometryInfo const&, LHCb::Tr::PID) const (in /hepgpu10-data1/johncob/alignment/upgrade-hackathon-setup/Rec/InstallArea/x86_64_v2-centos7-gcc11-dbg/lib/libTrackFitter.so)
      ==32663==    by 0xE9851B23: TrackMasterFitter::makeNodes(LHCb::Event::v1::Track&, LHCb::Tr::PID, std::any&, IGeometryInfo const&) const (in /hepgpu10-data1/johncob/alignment/upgrade-hackathon-setup/Rec/InstallArea/x86_64_v2-centos7-gcc11-dbg/lib/libTrackFitter.so)
      ==32663==    by 0xE985258F: TrackMasterFitter::fit_r(LHCb::Event::v1::Track&, std::any&, IGeometryInfo const&, LHCb::Tr::PID) const (in /hepgpu10-data1/johncob/alignment/upgrade-hackathon-setup/Rec/InstallArea/x86_64_v2-centos7-gcc11-dbg/lib/libTrackFitter.so)
      ==32663==    by 0xE9853884: TrackMasterFitter::operator()(LHCb::Event::v1::Track&, IGeometryInfo const&, LHCb::Tr::PID const&) const (in /hepgpu10-data1/johncob/alignment/upgrade-hackathon-setup/Rec/InstallArea/x86_64_v2-centos7-gcc11-dbg/lib/libTrackFitter.so)
      ==32663==    by 0x4BC17DDF: TrackBestTrackCreator::fitAndUpdateCounters((anonymous namespace)::TrackData&, IGeometryInfo const&) const (in /hepgpu10-data1/johncob/alignment/upgrade-hackathon-setup/Rec/InstallArea/x86_64_v2-centos7-gcc11-dbg/lib/libTrackUtils.so)
      ==32663==    by 0x4BC185FD: TrackBestTrackCreator::fitAndSelect((anonymous namespace)::TrackData&, IGeometryInfo const&) const (in /hepgpu10-data1/johncob/alignment/upgrade-hackathon-setup/Rec/InstallArea/x86_64_v2-centos7-gcc11-dbg/lib/libTrackUtils.so)
      ==32663==    by 0x4BC18EB7: TrackBestTrackCreator::operator()(Gaudi::Functional::details::vector_of_const_<KeyedContainer<LHCb::Event::v1::Track, Containers::KeyedObjectManager<Containers::hashmap> > > const&) const::{lambda((anonymous namespace)::TrackData&)#4} std::for_each<__gnu_cxx::__normal_iterator<std::reference_wrapper<(anonymous namespace)::TrackData>*, std::vector<std::reference_wrapper<(anonymous namespace)::TrackData>, std::allocator<std::reference_wrapper<(anonymous namespace)::TrackData> > > >, TrackBestTrackCreator::operator()(Gaudi::Functional::details::vector_of_const_<KeyedContainer<LHCb::Event::v1::Track, Containers::KeyedObjectManager<Containers::hashmap> > > const&) const::{lambda((anonymous namespace)::TrackData&)#4}>(__gnu_cxx::__normal_iterator<std::reference_wrapper<(anonymous namespace)::TrackData>*, std::vector<std::reference_wrapper<(anonymous namespace)::TrackData>, std::allocator<std::reference_wrapper<(anonymous namespace)::TrackData> > > >, TrackBestTrackCreator::operator()(Gaudi::Functional::details::vector_of_const_<KeyedContainer<LHCb::Event::v1::Track, Containers::KeyedObjectManager<Containers::hashmap> > > const&) const::{lambda((anonymous namespace)::TrackData&)#4}, TrackBestTrackCreator::operator()(Gaudi::Functional::details::vector_of_const_<KeyedContainer<LHCb::Event::v1::Track, Containers::KeyedObjectManager<Containers::hashmap> > > const&) const::{lambda((anonymous namespace)::TrackData&)#4}) (in /hepgpu10-data1/johncob/alignment/upgrade-hackathon-setup/Rec/InstallArea/x86_64_v2-centos7-gcc11-dbg/lib/libTrackUtils.so)
    • If you want to make sure, then just remove all code that touches any of those four mutable data members (m_errors, m_numcalls, m_totalstats, and m_stats) from TrackRungeKuttaExtrapolator. That should be cleanly/trivially possible, and it will also show that this has, by construction, no effect on the results.

    • actually, please just try with !2807 (merged)...

    • Please register or sign in to reply
  • 151 151 //=========================================================================
    152 152 StatusCode TrackMasterFitter::operator()( LHCb::span<LHCb::Track> tracks, IGeometryInfo const& geometry,
    153 153 const LHCb::Tr::PID& pid ) const {
    154 auto cache = createCache();
    155 StatusCode sc( StatusCode::SUCCESS );
    154 // add lock to ensure thread-safety
    155 std::lock_guard<std::mutex> lock( m_operatorLock );
  • John Leslie Cobbledick changed title from Add mutex locks to operator() to TrackMasterFItter: Add mutex locks to operator()

    changed title from Add mutex locks to operator() to TrackMasterFItter: Add mutex locks to operator()

  • assigned to @lpica

  • Gerhard Raven mentioned in merge request !2807 (merged)

    mentioned in merge request !2807 (merged)

  • mentioned in issue Moore#416 (closed)

  • Lorenzo Pica assigned to @fsouzade and unassigned @lpica

    assigned to @fsouzade and unassigned @lpica

  • Please register or sign in to reply
    Loading