Investigate ERROR log messages occuring in CI
With #1179 (closed) we now fail the CI jobs if unexpected errors are produced. It seems that most tests are working fine, but the repack system test is failing periodically. As we should not blindly whitelist the errors, it's best to investigate them.
The following was taken from this job: https://gitlab.cern.ch/cta/CTA/-/jobs/56816282
The errors are as follows:
In OStoreDB::RepackArchiveReportBatch::report(): async job update failed.
Could be related to #1142 (closed)?
Full Message:
{"epoch_time":1748606541.393195027,"local_time":"2025-05-30T14:02:21+0200","hostname":"cta-tpsrv02-0","program":"cta-taped","log_level":"ERROR","pid":411,"tid":411,"message":"In OStoreDB::RepackArchiveReportBatch::report(): async job update failed.","drive_name":"VDSTK02","instance":"test-repack-12127552","sched_backend":"ceph","SubprocessName":"maintenanceHandler","reportingType":"ArchiveSuccesses","fileId":4294967652,"subrequestAddress":"RepackSubRequest-Maintenance-cta-tpsrv01-0-417-20250530-13:52:24-0-5195","copyNb":1,"exceptionMsg":"In BackendRados::lockBackoff(): timeout : timeout set = 1000000 usec, time to lock the object : 1037823 usec, number of tries to lock = 80 object: RepackSubRequest-Maintenance-cta-tpsrv01-0-417-20250530-13:52:24-0-5195"}
In OStoreDB::RepackArchiveReportBatch::report(): failed to remove the root://ctaeos//eos/ctaeos/repack/V00101/ directory
Perhaps one of the previous tests somehow fails to clean up after itself?
Full message:
{"epoch_time":1748606570.713300511,"local_time":"2025-05-30T14:02:50+0200","hostname":"cta-tpsrv02-0","program":"cta-taped","log_level":"ERROR","pid":411,"tid":411,"message":"In OStoreDB::RepackArchiveReportBatch::report(): failed to remove the root://ctaeos//eos/ctaeos/repack/V00101/ directory","drive_name":"VDSTK02","instance":"test-repack-12127552","sched_backend":"ceph","SubprocessName":"maintenanceHandler","reportingType":"ArchiveSuccesses","repackRequestAddress":"RepackRequest-Frontend-cta-frontend-0-299-20250530-13:52:20-0-2202","exceptionMsg":"In XRootdDirectory::rmdir() : failed to remove directory at root://ctaeos//eos/ctaeos/repack/V00101/ [ERROR] Server responded with an error: [3018] Unable to rmdir - Directory not empty /eos/ctaeos/repack/V00101; Directory not empty\n code:400 errNo:3018 status:1"}
For a full list of currently whitelisted error messages see: https://gitlab.cern.ch/cta/CTA/-/tree/1179-ensure-ci-fails-if-there-are-error-messages-present-in-the-taped-frontend-logs/continuousintegration/orchestration/tests/error_whitelists?ref_type=heads
Note that we should eventually evaluate each and every one of these to see if they should be expected or not.