Prevent free drives to stall in production
The issue is described here: https://gitlab.cern.ch/cta/operations/-/issues/719
We need to investigate the root cause.
The issue is described here: https://gitlab.cern.ch/cta/operations/-/issues/719
We need to investigate the root cause.
Check the mechanism taking global lock when a tape is moved from DISABLED
to REPACKING
state and it is in REPACK PENDING
de-queueing user jobs (unless 2nd replica available, then to a new queue) one by one. Apparently this is causing global lock to kill production.
Combination of free tape servers looking for the lock + user request re-queueing one by one.
removed prioritymedium label
added prioritycritical label
@poliverc is tracking the timeline of events and debugging in this CodiMD https://codimd.web.cern.ch/fWvlG7_fSsKi8uAyXd7H4g#
assigned to @poliverc