Maintenance job occasionally stuck
We occasionally observe that maintenance jobs are stuck and need to restart the affected cta-taped process to get it to work again. I got the thread dump during one of such situations. Of course, I am not 100% sure this is the cause. The object store is a shared NFS volume.
Thread 1 (Thread 0x7f91f4758e80 (LWP 48675)):
#0 0x00000000004bb477 in std::string::append (this=0x7fffee8b9d10,
__s=0x7fffee8b9c10 "747-20230907-08:30:59-0-44338\232}KArchiveRequest-Frontend-cta-frontend.desy.de-9747-20230907-08:30:59-0-44339\232}KArchiveRequest-Frontend-cta-frontend.desy.deArchiveRequest-Frontend-cta-frontend.desy.de-9", __n=0) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/basic_string.tcc:751
#1 0x00007f91f2d9ff34 in cta::objectstore::BackendVFS::read (this=0x1526260, name="Frontend-cta-frontend.desy.de-9747-20230907-08:30:59-0")
at /usr/src/debug/cta-5-5616668git4d86f49d/objectstore/BackendVFS.cpp:188
#2 0x00007f91f2ca8749 in cta::objectstore::ObjectOps<cta::objectstore::serializers::Agent, (cta::objectstore::serializers::ObjectType)2>::getHeaderFromObjectStore (
this=0x1713d58) at /usr/src/debug/cta-5-5616668git4d86f49d/objectstore/ObjectOps.hpp:587
#3 0x00007f91f2ca752f in cta::objectstore::ObjectOps<cta::objectstore::serializers::Agent, (cta::objectstore::serializers::ObjectType)2>::fetchBottomHalf (
this=0x1713d58) at /usr/src/debug/cta-5-5616668git4d86f49d/objectstore/ObjectOps.hpp:456
#4 0x00007f91f2ded853 in cta::objectstore::ObjectOps<cta::objectstore::serializers::Agent, (cta::objectstore::serializers::ObjectType)2>::fetchNoLock (this=0x1713d58)
at /usr/src/debug/cta-5-5616668git4d86f49d/objectstore/ObjectOps.hpp:450
#5 0x00007f91f2decb89 in cta::objectstore::AgentWatchdog::readGCData (this=0x1713d40) at /usr/src/debug/cta-5-5616668git4d86f49d/objectstore/AgentWatchdog.hpp:85
#6 0x00007f91f2dec604 in cta::objectstore::AgentWatchdog::checkAlive (this=0x1713d40) at /usr/src/debug/cta-5-5616668git4d86f49d/objectstore/AgentWatchdog.hpp:36
#7 0x00007f91f2de40b9 in cta::objectstore::GarbageCollector::checkHeartbeats (this=0x7fffee8ba150, lc=...)
at /usr/src/debug/cta-5-5616668git4d86f49d/objectstore/GarbageCollector.cpp:133
#8 0x00007f91f2de35f8 in cta::objectstore::GarbageCollector::runOnePass (this=0x7fffee8ba150, lc=...)
at /usr/src/debug/cta-5-5616668git4d86f49d/objectstore/GarbageCollector.cpp:55
#9 0x00000000004a1bf8 in cta::tape::daemon::MaintenanceHandler::exceptionThrowingRunChild (this=0x14c1990)
at /usr/src/debug/cta-5-5616668git4d86f49d/tapeserver/daemon/MaintenanceHandler.cpp:326
#10 0x00000000004a15ae in cta::tape::daemon::MaintenanceHandler::runChild (this=0x14c1990)
at /usr/src/debug/cta-5-5616668git4d86f49d/tapeserver/daemon/MaintenanceHandler.cpp:249
#11 0x00000000004a9fd4 in cta::tape::daemon::ProcessManager::runForkManagement (this=0x7fffee8ba490)
at /usr/src/debug/cta-5-5616668git4d86f49d/tapeserver/daemon/ProcessManager.cpp:188
#12 0x00000000004a8f88 in cta::tape::daemon::ProcessManager::run (this=0x7fffee8ba490) at /usr/src/debug/cta-5-5616668git4d86f49d/tapeserver/daemon/ProcessManager.cpp:79
#13 0x00000000004707df in cta::tape::daemon::TapeDaemon::mainEventLoop (this=0x7fffee8ba610)
at /usr/src/debug/cta-5-5616668git4d86f49d/tapeserver/daemon/TapeDaemon.cpp:126
#14 0x00000000004703fa in cta::tape::daemon::TapeDaemon::exceptionThrowingMain (this=0x7fffee8ba610)
at /usr/src/debug/cta-5-5616668git4d86f49d/tapeserver/daemon/TapeDaemon.cpp:100
#15 0x000000000046fccc in cta::tape::daemon::TapeDaemon::main (this=0x7fffee8ba610) at /usr/src/debug/cta-5-5616668git4d86f49d/tapeserver/daemon/TapeDaemon.cpp:53
#16 0x000000000045c25f in cta::taped::exceptionThrowingMain (commandLine=..., log=...) at /usr/src/debug/cta-5-5616668git4d86f49d/tapeserver/cta-taped.cpp:130
#17 0x000000000045c945 in main (argc=2, argv=0x7fffee8bae48) at /usr/src/debug/cta-5-5616668git4d86f49d/tapeserver/cta-taped.cpp:232