Fix the race condition and python property of LFTripletSeeding
Add the !1735 (merged) back to 2024-patches
.
Random crashes in Allen were observed at https://lblogbook.cern.ch/HLT/1574, and RTA decided to revert the MR using !1784 (merged) to temporarily solve the problem.
After the offline investigation, we realized the crash was actually due to a race condition in AtomicAdd
. This MR addresses the race condition by adding extra protection. We tested the issue offline and confirmed that this MR resolves the problem.
Additionally, we discovered that maximum_number_of_triplets_per_warp
was being overwritten by the Python configuration, which is also fixed in this MR.
To ensure deterministic forward tracking, the extra protection will reset the seed to zero when overflow occurs. To avoid a dramatic decrease in efficiency, we also increased max_triplets_per_thread
from 3
to 4
, and set maximum_number_of_triplets_per_warp=32*max_triplets_per_thread*2
.