Segfault within Allen when running hlt1 on MC using a wrong condDB
Issue description
This issue is related to the following analysis production, where we run the 2024 simulation using a wrong condDB to test the impact of geometry description to physics variables: lhcb-datapkg/AnalysisProductions#521 (closed)
According to the following AP log: https://lhcb-dirac-logse.web.cern.ch/lhcb/MC/Dev/LOG/00227428/0000/00000053/Moore_00227428_00000053_1.log , the Allen step breaks when processing the file LFN:/lhcb/MC/Dev/DIGI/00225919/0000/00225919_00000175_1.digi
.
@cburr noticed that it happens to x86_64_v2-el9-gcc13+detdesc-opt
, and not x86_64_v3-el9-gcc13+detdesc-opt+g
or x86_64_v2-el9-gcc13+detdesc-dbg
. It happens for Moore/v55r9 (the version for the AnalysisProduction), Moore/v55r10, and the 2024-patches on 16/June/2024. Chris also noticed the following error message when debugging using gdb:
#0 0x00007fffca704b29 in pv_beamline_peak::pv_beamline_peak(pv_beamline_peak::Parameters) () from /cvmfs/lhcb.cern.ch/lib/lhcb/ALLEN/ALLEN_v4r10/InstallArea/x86_64_v2-el9-gcc13+detdesc-opt/lib/libAllenLib.so
Reproduce the issue in lxplus
The behavior of this issue can be reproduced locally by running a single Hlt1 script. To reproduce the issue, one could run the following script in lxplus: hlt1_badfile.py . The exact command is:
lhcb-proxy-init
(type password)
lb-set-platform x86_64_v2-el9-gcc13+detdesc-opt
lb-run Moore/v55r10 gaudirun.py hlt1_badfile.py | tee log_moore_v55r10_badfile
It stops at the 1767th event without giving any error message. The log file is attached: log_moore_v55r10_badfile
The same behavior is seen when using Moore v55r9. The was of reproducing the issue at lxplus is:
Option file: hlt1_badfile.py
Command:
lhcb-proxy-init
(type password)
lb-set-platform x86_64_v2-el9-gcc13+detdesc-opt
lb-run Moore/v55r9 gaudirun.py hlt1_badfile.py | tee log_moore_v55r9_badfile
Log file: log_moore_v55r9_badfile
Does not see the issue when using another digi file as input
The hlt1 step successfully finished for LFN:/lhcb/MC/Dev/DIGI/00225919/0000/00225919_00000153_1.digi
. To reproduce the result, do the following:
Option file: hlt1_goodfile.py
Command:
lhcb-proxy-init
(type password)
lb-set-platform x86_64_v2-el9-gcc13+detdesc-opt
lb-run Moore/v55r10 gaudirun.py hlt1_goodfile.py | tee log_moore_v55r10_goodfile
lb-run Moore/v55r9 gaudirun.py hlt1_goodfile.py | tee log_moore_v55r9_goodfile
Log files: log_moore_v55r9_goodfilelog_moore_v55r10_goodfile (a few lines in log_moore_v55r9_goodfile is missing due to my control + C ...)
Does not see the issue when using another platform
Cannot reproduce the issue when using x86_64_v2-el9-gcc13+detdesc-dbg
or x86_64_v3-el9-gcc13+detdesc-opt+g
.
x86_64_v2-el9-gcc13+detdesc-dbg
Option file: hlt1_badfile.py
Command:
lhcb-proxy-init
(type password)
lb-set-platform x86_64_v2-el9-gcc13+detdesc-dbg
lb-run Moore/v55r10 gaudirun.py hlt1_badfile.py | tee log_moore_v55r10_badfile
log file: log_moore_v55r10_badfile
x86_64_v3-el9-gcc13+detdesc-opt+g
Option file: hlt1_badfile.py
Command:
lhcb-proxy-init
(type password)
lb-set-platform x86_64_v3-el9-gcc13+detdesc-opt+g
lb-run Moore/v55r10 gaudirun.py hlt1_badfile.py | tee log_moore_v55r10_badfile
log file: log_moore_v55r10_badfile
Does not see the issue when using other condDB settings
sim10-2024.Q1.2-v1.0-md100 . The production condDB for the DIGI file. Perfect geometry description.
Option file: hlt1_badfile.py
Command:
lhcb-proxy-init
(type password)
lb-set-platform x86_64_v2-el9-gcc13+detdesc-opt
lb-run Moore/v55r10 gaudirun.py hlt1_badfile.py | tee log_moore_v55r10_badfile
Log file: log_moore_v55r10_badfile
sim10-2024.Q1.2-v1.0-md100-unmitigated-drift. The conddb for velo-drift study
Option file: hlt1_badfile.py
Command:
lhcb-proxy-init
(type password)
lb-set-platform x86_64_v2-el9-gcc13+detdesc-opt
lb-run Moore/v55r10 gaudirun.py hlt1_badfile.py | tee log_moore_v55r10_badfile
Log file: log_moore_v55r10_badfile