Skip to content

Retry handing on DISABLED tapes

Dev ticket for https://gitlab.cern.ch/cta/operations/-/issues/665, meant for Joao.

What happens is this:

  1. File is initially queued on a tape (replica 1 of a dual replica file)
  2. Tape is disabled after the file is queued.
  3. Tapeservers try to mount the tape, fail the request because the tape is disabled
  4. Request is always requeued on the same tape replica, despite it being in a failed tape.
  5. Maintenance process detects the request is queued in a disabled tape and fails it but does not notify EOS.

Two possible solutions were found:

  1. Simplify the logic by allowing queueing on disabled tapes always, but have the scheduler ignore retrieve queues for disabled tapes (suposedly disabled is a temporary state). Then either the tape moves to ACTIVE state and the requests go through, or they go to BROKEN. In this scenario, we change the maintenance process to look for queues with BROKEN tapes and in that case fail the request and notify EOS.

  2. Change the requeueing logic to ignore replicas on disabled tapes. This means that request is failed only if both tape replicas are disabled at the same time.

At the same time, we should fix the maintenance process logic, so that if there is no VID available to requeue the file, and the request is failed, it notifies EOS (this is already a big improvement over the current behaviour as at least the user will see something).

Edited by Michael Davis