HLT1 crashes when events-per-slice is not a divisor of 30000
When running the testbench for Allen, Allen crashes when the events-per-slice is not a multiple of 30000. The crash lines is copied below from the log file HLT1_0.log.
AllenConfig.py file used (events_per_slice
and output_batch_size
set to 800).
Commands executed from stack:
./MooreOnline/run MooreOnline/MooreScripts/scripts/testbench.py MooreOnline/MooreScripts/tests/options/HLT1/Arch.xml --working-dir testbench/slice_800/hlt1_pp_matching_no_ut/run_295336_mu3_3 --measure-throughput=300 --data-dir /scratch/allen_data/mep_input/meps_295336_mu3_3/ --hlt-type=hlt1_pp_matching_no_ut --partition TESTHLT1
[FATAL] Process: 'TESTHLT1_XXEB06_HLT1_0' (SignalHandler) RTL:Handled signal: 6 [SIGABRT] Old action:0x7fffebe04120 Mem:0x299100043af9 Code:FFFFFFFA
[INFO] Process: 'TESTHLT1_XXEB06_HLT1_0' (ExitSignalHandler) ---------------------- Backtrace ----------------------
[INFO] Process: 'TESTHLT1_XXEB06_HLT1_0' Number of elements in backtrace: 18
/scratch/dtou/cuda_stack/Online/InstallArea/x86_64_v3-el9-gcc12+cuda12_4-opt+g/lib/libOnlineBase.so.7.27.0.0(+0x13b370)[0x7fffeba82370]
/scratch/dtou/cuda_stack/Online/InstallArea/x86_64_v3-el9-gcc12+cuda12_4-opt+g/lib/libOnlineBase.so.7.27.0.0(_ZN3RTL17ExitSignalHandler7handlerEiP9siginfo_tPv+0x2a5)[0x7fffeba85a45]
/lib64/libc.so.6(+0x54d90)[0x7ffff7930d90]
/lib64/libc.so.6(+0xa154c)[0x7ffff797d54c]
/lib64/libc.so.6(raise+0x16)[0x7ffff7930ce6]
/lib64/libc.so.6(abort+0xd3)[0x7ffff79047f3]
/cvmfs/lhcb.cern.ch/lib/lcg/releases/gcc/12.1.0-2435c/x86_64-el9/lib64/libstdc++.so.6(+0xa1a89)[0x7ffff4928a89]
/cvmfs/lhcb.cern.ch/lib/lcg/releases/gcc/12.1.0-2435c/x86_64-el9/lib64/libstdc++.so.6(+0xacf0a)[0x7ffff4933f0a]
/cvmfs/lhcb.cern.ch/lib/lcg/releases/gcc/12.1.0-2435c/x86_64-el9/lib64/libstdc++.so.6(+0xacf75)[0x7ffff4933f75]
/scratch/dtou/cuda_stack/Allen/InstallArea/x86_64_v3-el9-gcc12+cuda12_4-opt+g/lib/libAllenLib.so(_ZNK17error_bank_filter19error_bank_filter_t17error_bank_filterENS_10ParametersEPK14IInputProviderjjj+0x465)[0x7fffecd4eb45]
/scratch/dtou/cuda_stack/Allen/InstallArea/x86_64_v3-el9-gcc12+cuda12_4-opt+g/lib/libAllenLib.so(+0x8a6dcd)[0x7fffecd54dcd]
/scratch/dtou/cuda_stack/Allen/InstallArea/x86_64_v3-el9-gcc12+cuda12_4-opt+g/lib/libAllenLib.so(_ZNK17error_bank_filter19error_bank_filter_tclERKN5Allen5Store8StoreRefISt5tupleIJNS_10Parameters17host_event_list_tENS5_12mep_layout_tENS5_23dev_output_event_list_tENS5_24host_output_event_list_tENS5_32host_number_of_selected_events_tENS5_18host_temp_counts_tENS5_15sd_bank_types_tENS5_17daq_error_types_tEEES4_IJS6_S7_S8_S9_SA_SB_EES5_S4_IJEEEERK14RuntimeOptionsRK9ConstantsRKNS1_7ContextE+0xf8)[0x7fffecd550e8]
/scratch/dtou/cuda_stack/Allen/InstallArea/x86_64_v3-el9-gcc12+cuda12_4-opt+g/lib/libAllenLib.so(_ZN6Stream3runEjRK14RuntimeOptions+0x5ee)[0x7fffed2af00e]
/scratch/dtou/cuda_stack/Allen/InstallArea/x86_64_v3-el9-gcc12+cuda12_4-opt+g/lib/libAllenLib.so(_Z10run_streammmiP6StreamSt10shared_ptrI14IInputProviderEP10IZeroMQSvcP14CheckerInvokerP11ROOTServicejbjb+0x67b)[0x7fffecc0961b]
/scratch/dtou/cuda_stack/Allen/InstallArea/x86_64_v3-el9-gcc12+cuda12_4-opt+g/lib/libAllenLib.so(_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJPFvmmiP6StreamSt10shared_ptrI14IInputProviderEP10IZeroMQSvcP14CheckerInvokerP11ROOTServicejbjbEjjiS4_S7_S9_SB_SD_jbjbEEEEE6_M_runEv+0x57)[0x7fffecbe2997]
/cvmfs/lhcb.cern.ch/lib/lcg/releases/gcc/12.1.0-2435c/x86_64-el9/lib64/libstdc++.so.6(+0xd8533)[0x7ffff495f533]
/lib64/libc.so.6(+0x9f802)[0x7ffff797b802]
/lib64/libc.so.6(+0x3f450)[0x7ffff791b450]
[INFO] Process: 'TESTHLT1_XXEB06_HLT1_0' (SignalHandler) 00 --> 0x7fffeba82370
[INFO] Process: 'TESTHLT1_XXEB06_HLT1_0' (SignalHandler) 01 --> 0x7fffeba85a45
[INFO] Process: 'TESTHLT1_XXEB06_HLT1_0' (SignalHandler) 02 --> 0x7ffff7930d90
[INFO] Process: 'TESTHLT1_XXEB06_HLT1_0' (SignalHandler) 03 --> 0x7ffff797d54c
[INFO] Process: 'TESTHLT1_XXEB06_HLT1_0' (SignalHandler) 04 --> 0x7ffff7930ce6
[INFO] Process: 'TESTHLT1_XXEB06_HLT1_0' (SignalHandler) 05 --> 0x7ffff79047f3
[INFO] Process: 'TESTHLT1_XXEB06_HLT1_0' (SignalHandler) 06 --> 0x7ffff4928a89
[INFO] Process: 'TESTHLT1_XXEB06_HLT1_0' (SignalHandler) 07 --> 0x7ffff4933f0a
[INFO] Process: 'TESTHLT1_XXEB06_HLT1_0' (SignalHandler) 08 --> 0x7ffff4933f75
[INFO] Process: 'TESTHLT1_XXEB06_HLT1_0' (SignalHandler) 09 --> 0x7fffecd4eb45
[INFO] Process: 'TESTHLT1_XXEB06_HLT1_0' (SignalHandler) 10 --> 0x7fffecd54dcd
[INFO] Process: 'TESTHLT1_XXEB06_HLT1_0' (SignalHandler) 11 --> 0x7fffecd550e8
[INFO] Process: 'TESTHLT1_XXEB06_HLT1_0' (SignalHandler) 12 --> 0x7fffed2af00e
[INFO] Process: 'TESTHLT1_XXEB06_HLT1_0' (SignalHandler) 13 --> 0x7fffecc0961b
[INFO] Process: 'TESTHLT1_XXEB06_HLT1_0' (SignalHandler) 14 --> 0x7fffecbe2997
[INFO] Process: 'TESTHLT1_XXEB06_HLT1_0' (SignalHandler) 15 --> 0x7ffff495f533
[INFO] Process: 'TESTHLT1_XXEB06_HLT1_0' (SignalHandler) 16 --> 0x7ffff797b802
[INFO] Process: 'TESTHLT1_XXEB06_HLT1_0' (SignalHandler) 17 --> 0x7ffff791b450
[INFO] mbm client TESTHLT1_XXEB06_HLT1_0 CRASHED - Shutdown of buffer Output_TESTHLT1 [pid:277241, part:65535]
[INFO] mbm client TESTHLT1_XXEB06_HLT1_0 CRASHED - Shutdown of buffer Output_TESTHLT1 [pid:277241, part:65535]
[INFO] mbm client TESTHLT1_XXEB06_HLT1_0.9 CRASHED - Shutdown of buffer Events_TESTHLT1 [pid:277241, part:65535]
[INFO] mbm client TESTHLT1_XXEB06_HLT1_0.8 CRASHED - Shutdown of buffer Events_TESTHLT1 [pid:277241, part:65535]
[INFO] mbm client TESTHLT1_XXEB06_HLT1_0.7 CRASHED - Shutdown of buffer Events_TESTHLT1 [pid:277241, part:65535]
[INFO] mbm client TESTHLT1_XXEB06_HLT1_0.6 CRASHED - Shutdown of buffer Events_TESTHLT1 [pid:277241, part:65535]
[INFO] mbm client TESTHLT1_XXEB06_HLT1_0.5 CRASHED - Shutdown of buffer Events_TESTHLT1 [pid:277241, part:65535]
[INFO] mbm client TESTHLT1_XXEB06_HLT1_0.4 CRASHED - Shutdown of buffer Events_TESTHLT1 [pid:277241, part:65535]
[INFO] mbm client TESTHLT1_XXEB06_HLT1_0.3 CRASHED - Shutdown of buffer Events_TESTHLT1 [pid:277241, part:65535]
[INFO] mbm client TESTHLT1_XXEB06_HLT1_0.2 CRASHED - Shutdown of buffer Events_TESTHLT1 [pid:277241, part:65535]
[INFO] mbm client TESTHLT1_XXEB06_HLT1_0.1 CRASHED - Shutdown of buffer Events_TESTHLT1 [pid:277241, part:65535]
[INFO] mbm client TESTHLT1_XXEB06_HLT1_0.0 CRASHED - Shutdown of buffer Events_TESTHLT1 [pid:277241, part:65535]
Edited by Da Yu Tou