Sprucing didn't crash when reading corrupt data
We have some files which have been processed despite the replica that was read being corrupt. The scale of the problem is still being evaluated but we already have a reproducer. I would have expected this job to crash but it exited "sucessfully":
make_bandq_BmKpPipMum_detached_b... WARNING Suppressing message: 'Too many composites. Stopping the combination.'
SLB_BToD0TauNuCombiner_6ae0f76d WARNING Suppressing message: 'Too many composites. Stopping the combination.'
EventSelector SUCCESS Reading Event record 30001. Record number within stream 1: 30001
EventSelector SUCCESS Reading Event record 40001. Record number within stream 1: 40001
zstd decompression error: Corrupted block detected
EventSelector.DataStreamTool_1 INFO Cannot read more data (Uncompressed record). End-of-File reached.
HLTControlFlowMgr INFO No more events in event selection
HLTControlFlowMgr INFO ---> Loop over 47637 Events Finished - WSS 2144, timed 47627 Events: 1318125 ms, Evts/s = 36.1324
B02D0TauNuX_Tau2eNuNu_D02KPi_WS_... INFO Number of counters : 7
The site has send us a copy of the corrupt file for debugging, to reproduce:
- Create
lbexec_options_00241952_00824132_1.yaml
compression:
algorithm: ZSTD
level: 1
conditions_version: master
data_type: Upgrade
geometry_version: run3/2024.Q1.2-v00.00
input_files:
- root://eoslhcb.cern.ch//eos/lhcb/user/c/cburr/306713_00230022_0055.cnaf.raw
input_process: Hlt2
input_raw_format: 0.5
input_type: RAW
n_threads: 1
output_file: 00241952_00824132_1.{stream}.dst
output_type: ROOT
process: Spruce
simulation: false
xml_summary_file: summaryMoore_00241952_00824132_1.xml
- Set up the environment:
lb-run --siteroot=/cvmfs/lhcb.cern.ch/lib/ -c best --use=SprucingConfig.v24r2p4 Moore/v55r12p3 bash --norc --noprofile
- Run:
lbexec SprucingConfig.Spruce24.Sprucing_production_physics_pp_Collision24c3:excl_spruce_production lbexec_options_00241952_00824132_1.yaml