Move logic to minimize mounts for multi-copy tape pool recalls out of scheduler logic
Introduction
On May 2020, the the following issue was created with the request to minimize mounts for multi-copy tape pool recalls:
This was achieved by selecting the tape based on the number of days since the epoch.
This guarantees that files with replicas on similar tapes will be forwarded to the same queue (if recalled on the same day), while also guaranteeing that there is an healthy daily rotation in the selection of tapes:
However, during my work on issue #77 (closed), I found that this solution has a few problems:
Problem 1:
It invalidates the logic that explicitly selects a vid, when this is specified in the protobuf (link to code). This should be done before any pruning of the tape file list, not after. Otherwise, it may miss any copies in the tapes that were excluded from the list.
Problem 2:
It interferes with the pre-existing logic that selects the best queue for retrieve (Helpers::selectBestRetrieveQueue(..)
):
This function already implements an algorithm to select the best tape copy. This is based on tape/queue attributes (such as current state), and on queue statistics. The queue statistics are particularly useful, because they allow us to select the queue with more requests, and thus optimize the retrieve speed.
If we do a premature pruning of the vids - as is done now - we will not be able to take advantage of these optimisations.
The function Helpers::selectBestRetrieveQueue(..)
is called here and the queue stats are updated here.
NOTE: The logic to update the stats is asynchronous. I think this explains why the requests mentioned on CTA-old#777 (closed) ended up in different queues: by the time the second request was being processed, the stats were not updated yet.
Problem 3:
For my work on #77 (closed) I need to modify Helpers::selectBestRetrieveQueue(..)
. These changes will have no effect if the vid list is already pruned.
Proposed solution
Fortunately, there seems to be a simple solution.
The Helpers::selectBestRetrieveQueue(..)
will select the best queue based on the statistics. If multiple queues have the same weight, then one will be selected at random:
This is where we should move the logic of CTA-old#777 (closed).
Instead of selecting a vid at random, we should order them and select one based on the number of days since the epoch.