Fix segmentation fault on archive job report URL
Summary
At the moment, there is a bug in std::string cta::ArchiveJob::exceptionThrowingReportURL() which has caused a series of SIGSEV faults, resulting in core dumps.
For example:
[root@tpsrv436 ~]# coredumpctl info 735065
PID: 735065 (F14C4R1-maint)
UID: 0 (root)
GID: 0 (root)
Signal: 11 (SEGV)
Timestamp: Mon 2025-11-03 01:56:22 CET (2 days ago)
Command Line: /usr/bin/cta-taped --log-format=json --log-to-file=/var/log/cta/cta-taped-IBMLIB4-TS1160-F14C4R1.log --config=/etc/cta/cta-taped-IBMLIB4-TS1160-F14C4R1.conf
Executable: /usr/bin/cta-taped
Control Group: /system.slice/system-cta\x2dtaped.slice/cta-taped@IBMLIB4-TS1160-F14C4R1.service
Unit: cta-taped@IBMLIB4-TS1160-F14C4R1.service
Slice: system-cta\x2dtaped.slice
Boot ID: 83717c5c5ecb43acb66620b3513ddfc3
Machine ID: 45f02d9ee92a44558a1923c0b960ceeb
Hostname: tpsrv436.cern.ch
Storage: /var/lib/systemd/coredump/core.F14C4R1-maint.0.83717c5c5ecb43acb66620b3513ddfc3.735065.1762131382000000.zst (present)
Size on Disk: 3.9M
Message: Process 735065 (F14C4R1-maint) of user 0 dumped core.
Stack trace of thread 735065:
#0 0x00007f354aa8bedc __pthread_kill_implementation (libc.so.6 + 0x8bedc)
#1 0x00007f354aa3eb46 raise (libc.so.6 + 0x3eb46)
#2 0x00007f355076ff94 skgesigOSCrash (libclntsh.so.23.1 + 0x376ff94)
#3 0x00007f3550f898c9 kpeDbgSignalHandler (libclntsh.so.23.1 + 0x3f898c9)
#4 0x00007f3550770327 skgesig_sigactionHandler (libclntsh.so.23.1 + 0x3770327)
#5 0x00007f354aa3ebf0 __restore_rt (libc.so.6 + 0x3ebf0)
#6 0x00007f3554c6faa9 _ZNKSt14default_deleteIN8CryptoPP13Base64EncoderEEclEPS1_ (libctascheduler.so.0 + 0x46faa9)
#7 0x00007f3554c6ca53 _ZNSt10unique_ptrIN8CryptoPP13Base64EncoderESt14default_deleteIS1_EED1Ev (libctascheduler.so.0 + 0x46ca53)
#8 0x00007f3554c63a29 _ZN3cta10ArchiveJob26exceptionThrowingReportURLB5cxx11Ev (libctascheduler.so.0 + 0x463a29)
#9 0x00007f3554cbd72f _ZN3cta9Scheduler22reportArchiveJobsBatchERNSt7__cxx114listISt10unique_ptrINS_10ArchiveJobESt14default_deleteIS4_EESaIS7_EEERNS_4disk19DiskReporterFactoryERNS_3log10TimingLis>
#10 0x00007f3554c8a9d1 _ZN3cta16DiskReportRunner10runOnePassERNS_3log10LogContextE (libctascheduler.so.0 + 0x48a9d1)
#11 0x0000000000501dca _ZN3cta4tape6daemon18MaintenanceHandler25exceptionThrowingRunChildEv (cta-taped + 0x101dca)
#12 0x0000000000501406 _ZN3cta4tape6daemon18MaintenanceHandler8runChildEv (cta-taped + 0x101406)
#13 0x0000000000506670 _ZN3cta4tape6daemon14ProcessManager17runForkManagementEv (cta-taped + 0x106670)
#14 0x000000000050560b _ZN3cta4tape6daemon14ProcessManager3runEv (cta-taped + 0x10560b)
#15 0x00000000004bb4d8 _ZN3cta4tape6daemon10TapeDaemon13mainEventLoopEv (cta-taped + 0xbb4d8)
#16 0x00000000004bb171 _ZN3cta4tape6daemon10TapeDaemon21exceptionThrowingMainEv (cta-taped + 0xbb171)
#17 0x00000000004bac4b _ZN3cta4tape6daemon10TapeDaemon4mainEv (cta-taped + 0xbac4b)
#18 0x00000000004a0d25 _ZN3cta5tapedL21exceptionThrowingMainERKNS_6daemon17CommandLineParamsERNS_3log6LoggerE (cta-taped + 0xa0d25)
#19 0x00000000004a1e75 main (cta-taped + 0xa1e75)
#20 0x00007f354aa295d0 __libc_start_call_main (libc.so.6 + 0x295d0)
#21 0x00007f354aa29680 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x29680)
#22 0x00000000004a0795 _start (cta-taped + 0xa0795)
Steps to reproduce
Look into a server, for example tpsrv436, and run:
coredumpctl list
coredumpctl info <pid>
What is the expected correct behaviour?
SIGSEV faults and coredumps should not happen so often in production and go undetected.
Relevant logs and/or screenshots
Possible causes
This is caused by this line of code:
The function CryptoPP::StringSource takes ownership of the CryptoPP::Base64Encoder object and becomes responsible for freeing the allocated resources:
However, because CryptoPP::Base64Encoder is wrapped by a std::unique_ptr, it will try to be freed again at the end of the scope, which will inevitably trigger this signal.