Intermittent crash in PbPb sequence
Allen occasionally crashes during the PbPb throughput test on the A5000. See the failing tests here: https://gitlab.cern.ch/lhcb/Allen/-/jobs/36493647. To debug, I produced a crash on the 2080TI on the online node running with -m 500 -n 350 -t 1
with the same sequence on the same dataset. Debugging with cuda-gdb
, I find that Allen crashes with a "Warp Out-of-range Address" error here: https://gitlab.cern.ch/lhcb/Allen/-/blob/master/device/selections/Hlt1/src/MakeSubBanks.cu?ref_type=heads#L443. I'm still working on debugging.
A couple of notes:
-
hlt1_pp_default
runs successfully on the same data with the same settings. - I can avoid the crash by removing the UPC photon lines.
- Together, this makes me think that there is an issue filling the neutral particle containers with a complex GEC.