Reduce the number of queue objects fetched in 'fetchMountInfo()'
As discussed in https://gitlab.cern.ch/cta/operations/-/issues/972, there was a big problem with the global scheduler DB lock taking a long time to by taken by a tape server. On average it would take 21 minutes but, since the lock acquisition is random, some servers could take up to 4 or 5 hours...
As mentioned in #285 (comment 6395672), on average each tape server would keep the lock for 8 seconds and - considering that there are 174 of them - this was what was causing the long delay time. It was particularly visible when there were no queues to pick at tap server kept looking repeatedly for new ones.
After some improvements, we managed to reduce this average lock time to 1.8 seconds, which greatly improved the waiting time for the lock.
However there are still some improvements that can be done inside the OStoreDB::fetchMountInfo(...)
function:
(For more details check #285 (closed))
Each tape drive is tied to a single logical library. Therefore there is no need to fetch all queues in OStoreDB::fetchMountInfo(...)
.
We should simply fetch the queues that are associated with the logical library (filter either by tape VID or tape pool name).
Tasks:
- Pass
logicalLibraryName
down toOStoreDB::fetchMountInfo(...)
. - Inside
OStoreDB::fetchMountInfo(...)
call two new functions:- Names:
m_catalogue.getCachedTapes(const TapeSearchCriteria &searchCriteria);
m_catalogue.getCachedTapePool(const TapeSearchCriteria &searchCriteria);
- Use the
logicalLibraryName
as a search criteria. - Cache the results:
- The values are not expected to change frequently and we can tolerate inconsistencies for a while...
- Names:
- Use the list of tapes/tape-pools to avoid fetching unnecessary queues from the object store DB.
- Remove the parameter
logicalLibraryName
fromScheduler::sortAndGetTapesForMountInfo(...)
:- With the new logic, this parameter becomes redundant. In addition, we avoid an unnecessary query to the catalogue D