Fix ATR-24207: Made m_strip_cache thread_local static
StripTdsOfflineTool::m_strip_cash is used by a private tool and accumulates data per event, used by several methods. Using SG handles would work well, but may have a high overhead. I made the variable 'thread_local static' to prevent both clashes between threads, and bogus results when using more than one thread.
This fixes ATR-24207: test_trig_mc_v1Dev_slice_muonNSW_grid.py now passes.
Merge request reports
Activity
This merge request affects 1 package:
- Trigger/TrigT1/TrigT1NSWSimTools
This merge request affects 2 files:
- Trigger/TrigT1/TrigT1NSWSimTools/TrigT1NSWSimTools/StripTdsOfflineTool.h
- Trigger/TrigT1/TrigT1NSWSimTools/src/StripTdsOfflineTool.cxx
Adding @afaulkne as watcher
added Trigger master review-pending-level-1 labels
added 1 commit
- 6c53dbc1 - Marked a few more data members as 'thread_local static'.
This merge request affects 1 package:
- Trigger/TrigT1/TrigT1NSWSimTools
This merge request affects 2 files:
- Trigger/TrigT1/TrigT1NSWSimTools/TrigT1NSWSimTools/StripTdsOfflineTool.h
- Trigger/TrigT1/TrigT1NSWSimTools/src/StripTdsOfflineTool.cxx
Adding @afaulkne as watcher
CI Result SUCCESS (hash e0d1934b)Athena AthSimulation AthGeneration AnalysisBase AthAnalysis DetCommon externals cmake make required tests optional tests Full details available on this CI monitor view
Athena: number of compilation errors 0, warnings 0
AthSimulation: number of compilation errors 0, warnings 0
AthGeneration: number of compilation errors 0, warnings 0
AnalysisBase: number of compilation errors 0, warnings 0
AthAnalysis: number of compilation errors 0, warnings 0
DetCommon: number of compilation errors 0, warnings 0
For experts only: Jenkins output [CI-MERGE-REQUEST-CC7 40587] CI Result FAILURE (hash 6c53dbc1)Athena AthSimulation AthGeneration AnalysisBase AthAnalysis DetCommon externals cmake make required tests optional tests Full details available on this CI monitor view
Athena: number of compilation errors 0, warnings 0
AthSimulation: number of compilation errors 0, warnings 0
AthGeneration: number of compilation errors 0, warnings 0
AnalysisBase: number of compilation errors 0, warnings 0
AthAnalysis: number of compilation errors 0, warnings 0
DetCommon: number of compilation errors 0, warnings 0
For experts only: Jenkins output [CI-MERGE-REQUEST-CC7 40589]This merge request affects 1 package:
- Trigger/TrigT1/TrigT1NSWSimTools
This merge request affects 2 files:
- Trigger/TrigT1/TrigT1NSWSimTools/TrigT1NSWSimTools/StripTdsOfflineTool.h
- Trigger/TrigT1/TrigT1NSWSimTools/src/StripTdsOfflineTool.cxx
Adding @afaulkne as watcher
CI Result SUCCESS (hash 6c53dbc1)Athena AthSimulation AthGeneration AnalysisBase AthAnalysis DetCommon externals cmake make required tests optional tests Full details available on this CI monitor view
Athena: number of compilation errors 0, warnings 0
AthSimulation: number of compilation errors 0, warnings 0
AthGeneration: number of compilation errors 0, warnings 0
AnalysisBase: number of compilation errors 0, warnings 0
AthAnalysis: number of compilation errors 0, warnings 0
DetCommon: number of compilation errors 0, warnings 0
For experts only: Jenkins output [CI-MERGE-REQUEST-CC7 40614]added review-approved label and removed review-pending-level-1 label
- Resolved by Vakhtang Tsulaia
Hi @ztarem , not quite sure I understand why these private data members need to be turned into
thread_local
-s. As far as I can tell, this is a private tool of a regular algorithmNSWL1Simulation
. So, the only possible scenario when the methods of this tool can be called concurrently from multiple threads is when theNSWL1Simulation
is clonable, and we make multiple copies of it at runtime. However, if this is the case, then for each copy of the algorithm Athena should make a separate copy for each of its private tools. Which means we cannot have concurrent access to private members of the same Tool object from multiple threads.Do I miss something obvious here? Thanks.
My understanding is different: NSWL1Simulation is adapted to a "new style" alg where the execute method can be called from different threads with different contexts, and thus all used tools, even if private, need to be thread-safe.
As an indirect indication that my assumption is correct, the reported issue was quite reliably reproducible with 4 threads, and after the fix the test passes cleanly.
My understanding is different: NSWL1Simulation is adapted to a "new style" alg where the execute method can be called from different threads with different contexts, and thus all used tools, even if private, need to be thread-safe.
Well, in this case
NSWL1Simulation
needs to be anAthReentrantAlgorithm
, while now it is a "regular"AthAlgorithm
(see hereAs an indirect indication that my assumption is correct, the reported issue was quite reliably reproducible with 4 threads, and after the fix the test passes cleanly.
This is exactly what confuses me, and that's why I'm saying that I probably miss something obvious here.
If this fix is urgent, we can go ahead and merge it as is. However, I'd try to understand what's really going on here. And, as a general comment, I think the code of
StripTdsOfflineTool
could use some cleanup/fixing/refactoring.Thanks.
I totally agree with you that the code in the sTGC trigger simulation (as a whole) needs a serious face-list. It seems that I will be charged with this task, including extending the simulator beyond the strip TDS.
If
m_strip_cache
is not used by multiple threads, then I cannot explain the behavior of the test before and after my change. I don't yet know enough about the trigger mechanics in AthenaMT. One can simply add a simple printout that will reveal the thread and context inside some key tool methods.