Skip to content

CTA forgets files on disk and never moves them to tape

Related to the following user ticket: https://cern.service-now.com/nav_to.do?uri=incident.do?sysparm_query=number=INC3992971

Summary

Files to be archived never arrive to tape and are stuck on the disk buffer

Steps to reproduce

Look like 1 in 10k files is affected by this issue from this particular archive operation. The files are transferred from EOSPUBLIC to EOSCTAPUBLIC, the transfer is correct, the checksum and size checks are performed and are correct but the file is never moved to tape. Therefore, the 'm' bit is missing from the stat information even after 10 days of waiting.

What is the current bug behaviour?

This affects users trying to move their data from EOS instances to tape and blocks the progress of their archive transfers. They also perceive this as a slowness of the system since for them the status is "wait on tape".

What is the expected correct behaviour?

The correct behavior is to have the file moved to tape in a reasonable amount of time.

Relevant logs and/or screenshots

Logs from the eos-archive side (notice the timestamps):

2024-08-15 00:47:32,162 transfer[3731997] transfer.py:1094 LVL=DEBUG File root://eosctapublic.cern.ch//eos/ctapublic/archive/archive/public/070829b9540a9fbbf6fcb3f84c6fb6e2e54829cea2da514d969b79daa858a3a4/run_009269/data_0473.root?svcClass=default is not yet on tape
...
2024-08-20 17:09:13,957 transfer[3731997] transfer.py:1094 LVL=DEBUG File root://eosctapublic.cern.ch//eos/ctapublic/archive/archive/public/070829b9540a9fbbf6fcb3f84c6fb6e2e54829cea2da514d969b79daa858a3a4/run_009269/data_0473.root?svcClass=default is not yet on tape