CTA Release v4.10.6-1 / v5.10.6-1
ATTENTION
THIS SHOULD BE MOVE TO VERSION v4.10.7-1 and v5.10.7-1. We've already tagged v4.10.6-1
ATTENTION
Initial code references for release
This release should be based on cf05b018
from branch release_4.10.6-1
.
It corresponds to cherry-picking these two commits from main
into release_4.10.6-1
:
This release will be tagged as version v4.10.6-1
/v5.10.6-1
.
Additional details
This release cherry-picks the commits from #500 (closed).
There fix a series of issues related to missing shard objects, which have caused serious complications in production:
- https://gitlab.cern.ch/cta/operations/-/issues/1190
- https://gitlab.cern.ch/cta/operations/-/issues/1201
Stress test results
This specific release needs to be stress tested carefully. In addition, we should test it by deleting shards manually. Both in the middle and at the end of the queues.
Standard stress test results
Additional specific tests
Deletion of Archive Shards

Reproducing production issue in operations#1190:
- drives down
- 30k files queued for archival in 2 shards (25k, 5k)
- first shard deleted (25k)
- 1 drive up
The drive up consumed the remaining shard and deleted the queue.

This one affects queueing.
First effect: archive failed and frontend is not responsive.
kubectl -n toto exec ctacli -ti -- cta-admin dr ls Error from XRootD SSI Framework: [FATAL] Connection error command terminated with exit code 1
All queueing subsequently fails. 2 ERRORs in the logs:
Oct 10 11:58:00.819827 ctafrontend cta-frontend: LVL="ERROR" PID="300" TID="376" MSG="In MemQueue::sharedAddToNewQueue(): got an exception writing. Will propagate to other threads." message="In BackendVFS::atomicOverwrite, trying to update a non-existing object"
Oct 10 11:58:00.826458 ctafrontend cta-frontend: LVL="ERROR" PID="300" TID="446" MSG="In ArchiveQueue::addJobsAndCommit(): shard not present. Rebuilding queue." archiveQueueObject="ArchiveQueueToTransferForUser-ctasystest-Frontend-ctafrontend-300-20231010-11:33:47-0-30006" shardNumber="0" shardObject="ArchiveQueueShard-ctasystest-Frontend-ctafrontend-300-20231010-11:33:47-0-30007"
And the following fault:
[Oct10 11:58] traps: xrootd[33136] general protection fault ip:7fbe22142273 sp:7fbdf17f9598 error:0 in libstdc++.so.6.0.19[7fbe22085000+e9000]
Backtrace attached ctafrontend-1696931880-xrootd-300-11.core.bt.