Skip to content

cta-taped drive process coredumping

Summary

During today's DB intervention I spotted that cta-taped drive processes crashed on a multi-drive tapeserver in September. The stacktrace does not point to a problem related to a multi-drive problem.

Steps to reproduce

Unknown yet.

What is the current bug behaviour?

SIGSEGV core dump. Also, there are no corefiles generated. This should be fixed at an operational level.

Relevant logs and/or screenshots

[root@tpsrv608 ~]# coredumpctl 
TIME                            PID UID GID SIG     COREFILE EXE                SIZE
Wed 2024-09-11 19:32:16 CEST 265789   0   0 SIGSEGV missing  /usr/bin/cta-taped    -
Sun 2024-09-15 15:12:14 CEST 810695   0   0 SIGSEGV missing  /usr/bin/cta-taped    -
Sep 15 15:12:14 tpsrv608.cern.ch systemd-coredump[833822]: [🡕] Process 810695 (cta-tpd-SPECTRA) of user 0 dumped core.
                                                           
                                                           Stack trace of thread 830455:
                                                           #0  0x00007fc04fa8b94c __pthread_kill_implementation (libc.so.6 + 0x8b94c)
                                                           #1  0x00007fc04fa3e646 raise (libc.so.6 + 0x3e646)
                                                           #2  0x00007fc052b0718f skgesigOSCrash (libclntsh.so.21.1 + 0x290718f)
                                                           #3  0x00007fc053206d8d kpeDbgSignalHandler (libclntsh.so.21.1 + 0x3006d8d)
                                                           #4  0x00007fc052b074a2 skgesig_sigactionHandler (libclntsh.so.21.1 + 0x29074a2)
                                                           #5  0x00007fc04fa3e6f0 __restore_rt (libc.so.6 + 0x3e6f0)
                                                           #6  0x00000000004a8a5f _ZN3cta3log5Param8setValueImEEvRKT_ (cta-taped + 0xa8a5f)
                                                           #7  0x00000000004a4fc7 _ZN3cta3log5ParamC2ImEESt17basic_string_viewIcSt11char_traitsIcEERKT_ (cta-taped + 0xa4fc7)
                                                           #8  0x00000000004a50d6 _ZN3cta3log20ScopedParamContainer3addImEERS1_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKT_ (cta-taped + 0xa50d6)
                                                           #9  0x00007fc057148554 _ZN3cta8OStoreDB12ArchiveMount22setJobBatchTransferredERNSt7__cxx114listISt10unique_ptrINS_17SchedulerDatabase10ArchiveJobESt14default_deleteIS6_EESaIS9_EEERNS_3log10LogContextE (libctascheduler.so.0 + 0x548554)
                                                           #10 0x00007fc0570592f0 _ZN3cta12ArchiveMount26reportJobsBatchTransferredERSt5queueISt10unique_ptrINS_10ArchiveJobESt14default_deleteIS3_EESt5dequeIS6_SaIS6_EEERS1_INS_9catalogue15TapeItemWrittenES7_ISD_SaISD_EEERS1_IS2_INS_17SchedulerDatabase10ArchiveJobES4_>
                                                           #11 0x000000000055825e _ZN6castor4tape10tapeserver6daemon21MigrationReportPacker11ReportFlush7executeERS3_ (cta-taped + 0x15825e)
                                                           #12 0x000000000055a0e8 _ZN6castor4tape10tapeserver6daemon21MigrationReportPacker12WorkerThread3runEv (cta-taped + 0x15a0e8)
                                                           #13 0x00007fc0551fcf93 _ZN3cta9threading6Thread14pthread_runnerEPv (libctacommon.so.0 + 0x17cf93)
                                                           #14 0x00007fc04fa89c02 start_thread (libc.so.6 + 0x89c02)
                                                           #15 0x00007fc04fb0ec40 __clone3 (libc.so.6 + 0x10ec40)
                                                           
                                                           Stack trace of thread 810697:
                                                           #0  0x00007fc04fb0e21e epoll_wait (libc.so.6 + 0x10e21e)
                                                           #1  0x00007fc04f3fe748 _ZN11EpollDriver10event_waitERSt6vectorI14FiredFileEventSaIS1_EEP7timeval (libceph-common.so.2 + 0x3fe748)
                                                           #2  0x00007fc04f3fcda6 _ZN11EventCenter14process_eventsEjPNSt6chrono8durationImSt5ratioILl1ELl1000000000EEEE (libceph-common.so.2 + 0x3fcda6)
                                                           #3  0x00007fc04f3fd916 _ZNSt17_Function_handlerIFvvEZN12NetworkStack10add_threadEP6WorkerEUlvE_E9_M_invokeERKSt9_Any_data (libceph-common.so.2 + 0x3fd916)
                                                           #4  0x00007fc04fedbad4 execute_native_thread_routine (libstdc++.so.6 + 0xdbad4)
                                                           #5  0x00007fc04fa89c02 start_thread (libc.so.6 + 0x89c02)
                                                           #6  0x00007fc04fb0ec40 __clone3 (libc.so.6 + 0x10ec40)
                                                           
                                                           Stack trace of thread 810695:
                                                           #0  0x00007fc04fa8679a __futex_abstimed_wait_common (libc.so.6 + 0x8679a)
                                                           #1  0x00007fc04fa8b6d3 __pthread_clockjoin_ex (libc.so.6 + 0x8b6d3)
                                                           #2  0x00007fc0551fcbcf _ZN3cta9threading6Thread4waitEv (libctacommon.so.0 + 0x17cbcf)
                                                           #3  0x0000000000566245 _ZN6castor4tape10tapeserver6daemon21MigrationTaskInjector11waitThreadsEv (cta-taped + 0x166245)
                                                           #4  0x00000000004e7344 _ZN6castor4tape10tapeserver6daemon19DataTransferSession12executeWriteERN3cta3log10LogContextEPNS4_12ArchiveMountERNS2_19TapeSessionReporterE (cta-taped + 0xe7344)
                                                           #5  0x00000000004e5756 _ZN6castor4tape10tapeserver6daemon19DataTransferSession7executeEv (cta-taped + 0xe5756)
                                                           #6  0x00000000004a0eee _ZNK3cta4tape6daemon12DriveHandler26executeDataTransferSessionEPNS_10ISchedulerEPNS1_10TapedProxyE (cta-taped + 0xa0eee)
                                                           #7  0x000000000049d2cb _ZN3cta4tape6daemon12DriveHandler8runChildEv (cta-taped + 0x9d2cb)
                                                           #8  0x00000000004c9ced _ZN3cta4tape6daemon14ProcessManager17runForkManagementEv (cta-taped + 0xc9ced)
                                                           #9  0x00000000004c8db0 _ZN3cta4tape6daemon14ProcessManager3runEv (cta-taped + 0xc8db0)
                                                           #10 0x000000000048f7c4 _ZN3cta4tape6daemon10TapeDaemon13mainEventLoopEv (cta-taped + 0x8f7c4)
                                                           #11 0x000000000048f465 _ZN3cta4tape6daemon10TapeDaemon21exceptionThrowingMainEv (cta-taped + 0x8f465)
                                                           #12 0x000000000048ee01 _ZN3cta4tape6daemon10TapeDaemon4mainEv (cta-taped + 0x8ee01)
                                                           #13 0x000000000047fdf7 _ZN3cta5tapedL21exceptionThrowingMainERKNS_6daemon17CommandLineParamsERNS_3log6LoggerE (cta-taped + 0x7fdf7)
                                                           #14 0x000000000048057a main (cta-taped + 0x8057a)
                                                           #15 0x00007fc04fa29590 __libc_start_call_main (libc.so.6 + 0x29590)
                                                           #16 0x00007fc04fa29640 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x29640)
                                                           #17 0x000000000047f905 _start (cta-taped + 0x7f905)
                                                           
                                                           Stack trace of thread 810698:
                                                           #0  0x00007fc04fb0e21e epoll_wait (libc.so.6 + 0x10e21e)
                                                           #1  0x00007fc04f3fe748 _ZN11EpollDriver10event_waitERSt6vectorI14FiredFileEventSaIS1_EEP7timeval (libceph-common.so.2 + 0x3fe748)
                                                           #2  0x00007fc04f3fcda6 _ZN11EventCenter14process_eventsEjPNSt6chrono8durationImSt5ratioILl1ELl1000000000EEEE (libceph-common.so.2 + 0x3fcda6)
                                                           #3  0x00007fc04f3fd916 _ZNSt17_Function_handlerIFvvEZN12NetworkStack10add_threadEP6WorkerEUlvE_E9_M_invokeERKSt9_Any_data (libceph-common.so.2 + 0x3fd916)
                                                           #4  0x00007fc04fedbad4 execute_native_thread_routine (libstdc++.so.6 + 0xdbad4)
                                                           #5  0x00007fc04fa89c02 start_thread (libc.so.6 + 0x89c02)
                                                           #6  0x00007fc04fb0ec40 __clone3 (libc.so.6 + 0x10ec40)
[...] (remaining threads were on wait/sleep                                                       

Possible causes

Edited by Pablo Oliver Cortes