Skip to content

Explicitly enable TGeoManager multi-threading for DD4hepSvc to stop seg-faults

Izaac Sanderswood requested to merge isanders-fix-mt-nav into 2024-patches

Required to make FTHitEfficiencyMonitor work when multi-threaded, which is urgently needed to add FTHitEfficiencyMonitor to CalibMon.

The ROOT geometry manger in Detector/DD4hep requires some explicit configuration for multi-threading. If SetMaxThreads is not set, the gGeoManager assumes it is running single-threaded, and calls geometry navigation functions without any thread-safety (meaning this function can cause a seg-fault).

TGeoManager::SetMaxThreads(n_threads) needs to be called a single time in the main thread, and should be set to the number of threads passed to the application options.

After this is set, a TGeoNavigator can be added per thread, and function calls to the TGeoManager will provide the thread-local navigators.

See also Detector!608 which adds a check there is a TGeoNavigator for the thread before calling the problem function

To do:

  • What value should max threads be set to
  • Is it possible to run some code once per thread?

What TGeoManger::SetMaxThreads(N) is doing:

Inside the function setMaxThreads(N) ROOT::EnableThreadSafety() is called, and the current (if it exists) TGeoNavigator is replaced with one mapped to the current thread id. It then sets the flag kMultiThread = kTrue and loops over the detector volumes to create thread-local data for N threads. If there is existing thread-local data (i.e. if setMaxThreads was previously called) then the existing thread-local data is removed first. For this reason setMaxThreads(N) should be called once in the main thread, before any multi-threaded code runs, and will not work if called from within the multi-threading since the thread<->data maps would be repeatedly deleted and not shared between threads.

This does not create N TGeoNavigators. Therefore, when calling functions that use geometry navigation, it should be ensured that there is a thread-local TGeoNavigator with e.g.


auto& manager = dd4hep::Detector::getInstance().manager();
TGeoNavigator* nav = manager.GetCurrentNavigator();
if ( !nav ) nav = manager.AddNavigator();
myFunctionThatUsesGeometryNavigation() ;

These functions get a thread-local TGeoNavigator only if kMultiThread = kTrue. If kMultiThread = kFalse, i.e. if setMaxThreads was not previously called, then TGeoManager assumes there is only a single thread and does not search to the thread<->navigator map, which causes a segfault if the application is actually being run with multiple threads, since there is only a TGeoNavigator for the main thread. If kMultiThread = kTrue and there is no TGeoNavigator for the current thread, manager.GetCurrentNavigator() returns a null pointer, which is why you have to check and add one if this is the case.

See the functions below

https://root.cern.ch/doc/master/classTGeoManager.html#ab5cfc0292200e4d941676d353e7308d5

void TGeoManager::SetMaxThreads(Int_t nthreads)
{
   if (!fClosed) {
      Error("SetMaxThreads", "Cannot set maximum number of threads before closing the geometry");
      return;
   }
   if (!fMultiThread) {
      ROOT::EnableThreadSafety();
      std::thread::id threadId = std::this_thread::get_id();
      NavigatorsMap_t::const_iterator it = fNavigators.find(threadId);
      if (it != fNavigators.end()) {
         TGeoNavigatorArray *array = it->second;
         fNavigators.erase(it);
         fNavigators.insert(NavigatorsMap_t::value_type(threadId, array));
      }
   }
   if (fMaxThreads) {
      ClearThreadsMap();
      ClearThreadData();
   }
   fMaxThreads = nthreads + 1;
   if (fMaxThreads > 0) {
      fMultiThread = kTRUE;
      CreateThreadData();
   }
}

https://root.cern.ch/doc/master/classTGeoManager.html#a4f37cb2eb0cdfb67ce89c5fd783c33c3

TGeoNavigator *TGeoManager::GetCurrentNavigator() const
{
   TTHREAD_TLS(TGeoNavigator *) tnav = nullptr;
   if (!fMultiThread)
      return fCurrentNavigator;
   TGeoNavigator *nav = tnav; // TTHREAD_TLS_GET(TGeoNavigator*,tnav);
   if (nav)
      return nav;
   std::thread::id threadId = std::this_thread::get_id();
   NavigatorsMap_t::const_iterator it = fNavigators.find(threadId);
   if (it == fNavigators.end())
      return nullptr;
   TGeoNavigatorArray *array = it->second;
   nav = array->GetCurrentNavigator();
   tnav = nav; // TTHREAD_TLS_SET(TGeoNavigator*,tnav,nav);
   return nav;
}

https://root.cern.ch/doc/master/classTGeoManager.html#a052ff41b02b7962e6edce8f4f0ab33a5

TGeoNavigator *TGeoManager::AddNavigator()
{
   if (fMultiThread) {
      TGeoManager::ThreadId();
      fgMutex.lock();
   }
   std::thread::id threadId = std::this_thread::get_id();
   NavigatorsMap_t::const_iterator it = fNavigators.find(threadId);
   TGeoNavigatorArray *array = nullptr;
   if (it != fNavigators.end())
      array = it->second;
   else {
      array = new TGeoNavigatorArray(this);
      fNavigators.insert(NavigatorsMap_t::value_type(threadId, array));
   }
   TGeoNavigator *nav = array->AddNavigator();
   if (fClosed)
      nav->GetCache()->BuildInfoBranch();
   if (fMultiThread)
      fgMutex.unlock();
   return nav;
}

Then finally the problematic Contains function, which is only thread-safe if all the above have been called: https://root.cern/doc/master/classTGeoShapeAssembly.html#acc50f9347ef358a13e5161c2e92f97eb

Bool_t TGeoShapeAssembly::Contains(const Double_t *point) const
{
   if (!fBBoxOK)
      ((TGeoShapeAssembly *)this)->ComputeBBox();
   if (!TGeoBBox::Contains(point))
      return kFALSE;
   TGeoVoxelFinder *voxels = fVolume->GetVoxels();
   TGeoNode *node;
   TGeoShape *shape;
   Int_t *check_list = nullptr;
   Int_t ncheck, id;
   Double_t local[3];
   if (voxels) {
      // get the list of nodes passing thorough the current voxel
      TGeoNavigator *nav = gGeoManager->GetCurrentNavigator();
      TGeoStateInfo &td = *nav->GetCache()->GetInfo();
      check_list = voxels->GetCheckList(point, ncheck, td);
      if (!check_list) {
         nav->GetCache()->ReleaseInfo();
         return kFALSE;
      }
      for (id = 0; id < ncheck; id++) {
         node = fVolume->GetNode(check_list[id]);
         shape = node->GetVolume()->GetShape();
         node->MasterToLocal(point, local);
         if (shape->Contains(local)) {
            fVolume->SetCurrentNodeIndex(check_list[id]);
            fVolume->SetNextNodeIndex(check_list[id]);
            nav->GetCache()->ReleaseInfo();
            return kTRUE;
         }
      }
      nav->GetCache()->ReleaseInfo();
      return kFALSE;
   }
   Int_t nd = fVolume->GetNdaughters();
   for (id = 0; id < nd; id++) {
      node = fVolume->GetNode(id);
      shape = node->GetVolume()->GetShape();
      node->MasterToLocal(point, local);
      if (shape->Contains(local)) {
         fVolume->SetCurrentNodeIndex(id);
         fVolume->SetNextNodeIndex(id);
         return kTRUE;
      }
   }
   return kFALSE;
}
Edited by Izaac Sanderswood

Merge request reports