Add back-off behaviour to cta-rmcd
Problem to solve
On IBM tape libraries, too many simultaneous dismount requests leads to contention which causes cta-rmcd
to give up and return an error to cta-taped
. This causes failed sessions when there is no problem with the drive or the tape.
(Spectra Logic libraries are not affected by this issue as the DriveIQ feature defers the dismount until the next tape is ready to be mounted).
Stakeholders
Less noise for tape operators dealing with the consequences of failed sessions. Less disruption to tape operations, especially important out-of-hours.
Proposal
Replace the current "try 10 times and give up" with an exponential back-off.
Maximum timeout before failure should be a configurable parameter defaulting to 10 minutes (= 20 drives dismounting simultaneously per library × 30 seconds for robotics to perform the dismount).