Refactor drive status code to use DB idioms
The code for cta-admin drive ls
was ported from the objectstore to the DB, see catalogue/RdbmsCatalogue.[ch]pp
methods:
- createTapeDrive
- getTapeDriveNames
- getTapeDrives
- getTapeDrive
- modifyTapeDrive
- deleteTapeDrive
- createDriveConfig
- getDriveConfigNamesAndKeys
- modifyDriveConfig
- getDrivesConfigs
- getDriveConfig
- deleteDriveConfig
- getExistingDrivesReservations
- settingSqlTapeDriveValues
- gettingSqlTapeDriveValues
These methods are called from the scheduler code, scheduler/TapeDrivesCatalogueState.[ch]pp
and scheduler/DriveConfig.cpp
.
This code was ported directly from the objectstore implementation so is done the "objectstore way": read the entire row (object) from the DB into a local data structure, update the data structure, write back the entire row (object) to the DB.
It would be more efficient to refactor this code to do operations the "DB way": only update the fields which need to be updated. In many cases this can avoid an unnecessary read from the DB. In addition, it removes the necessity to do locking in order to ensure consistency. Each DB update is an atomic transaction and the DB itself takes care of maintaining consistency.
Example 1: Set Drive State
In TapeDrivesCatalogueState::setDesiredDriveState
:
- Read the drive state row from the DB:
auto driveState = m_catalogue.getTapeDrive(drive);
- Check the row exists
- Change the values of some fields in
driveState
to new ones - Write the entire drive state row back to the DB:
m_catalogue.modifyTapeDrive(driveState.value());
There is the potential for a race condition here, as there is no lock taken on the DB. Potentially another process could change the values of fields which are not included in the update, after the read and before the write. These changes would be overwritten by the final modifyTapeDrive
.
This should be refactored to a single DB atomic action:
- No need to read the DB
- In
RdbmsCatalogue
, add a method which doesUPDATE TAPE_DRIVE SET ...
and updates ONLY the required fields. - Handle the exception in case the row to be updated does not exist
Example 2: Add Disk Space Reservation
In RdbmsCatalogue::addDiskSpaceReservation
, we have:
- Read the drive state row from the DB:
auto tdStatus = getTapeDrive(driveName);
- Check the row exists
- Update the disk space reservation values
- Write the entire drive state row back to the DB:
modifyTapeDrive(tdStatus.value());
This can be refactored to a single DB atomic action:
- No need to read the DB
- Update the DB using
UPDATE TAPE_DRIVE SET DISK_SYSTEM_NAME = :DISK_SYSTEM_NAME, RESERVED_BYTES = RESERVED_BYTES + :BYTES_TO_ADD
- Handle the exception in case the row to be updated does not exist
REASON field is getting lost
See this note on the CTA Release 4.4.0 ops issue. If drive state is changed while the drive is busy, the reason is set, but then it gets lost when the drive finally transitions to the DOWN state.
The refactoring above should hopefully fix this issue, as we will only update necessary fields and so not accidentally overwrite a field with a blank value. But we need to test this specific case.
TO DO
- Review the scheduler code to find all the places where the above methods are called. Refactor the code where possible to do a single atomic update instead of the three-step "read/set values/update". Add new methods to the catalogue where necessary.
- Same for the Disk Reservation code (and any other methods in the catalogue) which update the drive statuses.
Naming conventions
Minor point but if we are refactoring this code we may as well address it: some method names could be revised to be more consistent.
- Some method names include
Drive
while others areTapeDrive
. Make them allTapeDrive
. For plural form, only the final word needs to be plural form (e.g.getTapeDriveConfigs
notgetTapeDrivesConfigs
). - The lambda functions
setOptionalString
,setOptionalUint64
,setOptionalTime
should be renamed tobindOptionalString
,bindOptionalUint64
,bindOptionalTime
to be more consistent with similarly-named functions.