Skip to content
Snippets Groups Projects

Fix ATR-24207: Made m_strip_cache thread_local static

Merged Zvi Tarem requested to merge ztarem/athena:ztarem_nsw_trigger_strip_cache into master

StripTdsOfflineTool::m_strip_cash is used by a private tool and accumulates data per event, used by several methods. Using SG handles would work well, but may have a high overhead. I made the variable 'thread_local static' to prevent both clashes between threads, and bogus results when using more than one thread.

This fixes ATR-24207: test_trig_mc_v1Dev_slice_muonNSW_grid.py now passes.

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Hi @ztarem , not quite sure I understand why these private data members need to be turned into thread_local-s. As far as I can tell, this is a private tool of a regular algorithm NSWL1Simulation. So, the only possible scenario when the methods of this tool can be called concurrently from multiple threads is when the NSWL1Simulation is clonable, and we make multiple copies of it at runtime. However, if this is the case, then for each copy of the algorithm Athena should make a separate copy for each of its private tools. Which means we cannot have concurrent access to private members of the same Tool object from multiple threads.

    Do I miss something obvious here? Thanks.

  • Author Developer

    My understanding is different: NSWL1Simulation is adapted to a "new style" alg where the execute method can be called from different threads with different contexts, and thus all used tools, even if private, need to be thread-safe.

    As an indirect indication that my assumption is correct, the reported issue was quite reliably reproducible with 4 threads, and after the fix the test passes cleanly.

  • My understanding is different: NSWL1Simulation is adapted to a "new style" alg where the execute method can be called from different threads with different contexts, and thus all used tools, even if private, need to be thread-safe.

    Well, in this case NSWL1Simulation needs to be an AthReentrantAlgorithm, while now it is a "regular" AthAlgorithm (see here

    As an indirect indication that my assumption is correct, the reported issue was quite reliably reproducible with 4 threads, and after the fix the test passes cleanly.

    This is exactly what confuses me, and that's why I'm saying that I probably miss something obvious here.

    If this fix is urgent, we can go ahead and merge it as is. However, I'd try to understand what's really going on here. And, as a general comment, I think the code of StripTdsOfflineTool could use some cleanup/fixing/refactoring.

    Thanks.

  • Author Developer

    I totally agree with you that the code in the sTGC trigger simulation (as a whole) needs a serious face-list. It seems that I will be charged with this task, including extending the simulator beyond the strip TDS.

    If m_strip_cache is not used by multiple threads, then I cannot explain the behavior of the test before and after my change. I don't yet know enough about the trigger mechanics in AthenaMT. One can simply add a simple printout that will reveal the thread and context inside some key tool methods.

  • Vakhtang Tsulaia resolved all threads

    resolved all threads

  • OK, in the interest of not keeping this MR open forever, I'm going to merge it as is. However, I'm curious to learn more about how it became possible for this tool to be called from multiple threads concurrently. I'll try to dig around for a bit.

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading