Set next mount type to NoMount when a tape drive is set down.
Yesterday night we have again the error:
Scheduler::sortAndGetTapesForMountInfo(): for Existing or Next Mounts, tapePool is an empty string.
It was triggered during the by tape server tpsrv473
:
[1694397523.709937000] Sep 11 03:58:43.709937 tpsrv473.cern.ch cta-taped: LVL="ERROR" PID="34883" TID="34883" MSG="Error while scheduling new mount. Putting the drive down. Stack trace follows." thread="MainThread" tapeDrive="I4600231"
[1694681650.405885000] Sep 14 10:54:10.405885 tpsrv473.cern.ch cta-taped: LVL="INFO" PID="21808" TID="21808" MSG="In DriveHandler::shutdown(): starting cleaner." tapeVid="I76707" tapeDrive="I4600231" sessionState="Fatal" sessionType="Archive"
[1694681650.406292000] Sep 14 10:54:10.406292 tpsrv473.cern.ch cta-taped: LVL="INFO" PID="21808" TID="21808" MSG="Cleaner set process capabilities for using tape" capabilities="= cap_sys_rawio+ep"
[1694681650.408947000] Sep 14 10:54:10.408947 tpsrv473.cern.ch cta-taped: LVL="INFO" PID="21808" TID="21808" MSG="Cleaner waiting for drive to be ready" tapeVid="I76707" tapeDrive="I4600231" waitMediaInDriveTimeout="300"
[1694681650.417466000] Sep 14 10:54:10.417466 tpsrv473.cern.ch cta-taped: LVL="INFO" PID="21808" TID="21808" MSG="Cleaner detected drive is ready" tapeVid="I76707" tapeDrive="I4600231" waitMediaInDriveTimeout="300"
[1694681650.423002000] Sep 14 10:54:10.423002 tpsrv473.cern.ch cta-taped: LVL="INFO" PID="21808" TID="21808" MSG="Cleaner rewinding tape" tapeVid="I76707" tapeDrive="I4600231"
[1694681650.437494000] Sep 14 10:54:10.437494 tpsrv473.cern.ch cta-taped: LVL="ERROR" PID="21808" TID="21808" MSG="In CleanerSession::exceptionThrowingExecute(), failed to clean the Drive with a tape mounted. Disabling the tape." tapeVid="I76707" tapeDrive="I4600231" logicalLibrary="
IBM460" host="tpsrv473" exceptionMsg="Failed ST ioctl (MTREW) in DriveGeneric::rewind Errno=5: Input/output error"
[1694681650.441211000] Sep 14 10:54:10.441211 tpsrv473.cern.ch cta-taped: LVL="ERROR" PID="21808" TID="21808" MSG="Cleaner failed, the drive is going down." tapeVid="I76707" tapeDrive="I4600231" message="Failed ST ioctl (MTREW) in DriveGeneric::rewind Errno=5: Input/output error"
[1694681650.444210000] Sep 14 10:54:10.444210 tpsrv473.cern.ch cta-taped: LVL="INFO" PID="21808" TID="21808" MSG="In Scheduler::setDesiredDriveState(): success." drive="I4600231" up="down" force="no" reason="[cta-taped] ERROR Cleaner failed. Failed ST ioctl (MTREW) in DriveGeneric::rewind Errno=5: Input/output error" comment="" schedulerDbTime="0.001090"
[1694681650.446967000] Sep 14 10:54:10.446967 tpsrv473.cern.ch cta-taped: LVL="INFO" PID="21808" TID="21808" MSG="In Agent::removeAndUnregisterSelf(): Removed agent object." agentObject="DriveHandlerShutdown-I4600231-tpsrv473.cern.ch-21808-20230914-10:54:10-0"
[1694681650.457444000] Sep 14 10:54:10.457444 tpsrv473.cern.ch cta-taped: LVL="INFO" PID="21808" TID="21808" MSG="Signaled shutdown to subprocess handler" SubprocessName="drive:I4600231" ShutdownComplete="1"
It has an error scheduling the nextMountType
when a tape is mounted, and when during the cleaner it tries to rewind it fails.
In the code of OStore.cpp
we have:
std::set<int> activeDriveStatuses = {
(int)cta::common::dataStructures::DriveStatus::Starting,
(int)cta::common::dataStructures::DriveStatus::Mounting,
(int)cta::common::dataStructures::DriveStatus::Transferring,
(int)cta::common::dataStructures::DriveStatus::Unloading,
(int)cta::common::dataStructures::DriveStatus::Unmounting,
(int)cta::common::dataStructures::DriveStatus::DrainingToDisk,
(int)cta::common::dataStructures::DriveStatus::CleaningUp };
std::set<int> activeMountTypes = {
(int)cta::common::dataStructures::MountType::ArchiveForUser,
(int)cta::common::dataStructures::MountType::ArchiveForRepack,
(int)cta::common::dataStructures::MountType::Retrieve,
(int)cta::common::dataStructures::MountType::Label };
for (const auto& driveState : driveStates) {
if (activeDriveStatuses.count(static_cast<int>(driveState.driveStatus))) {
tmdi.existingOrNextMounts.push_back(ExistingMount());
tmdi.existingOrNextMounts.back().type = driveState.mountType;
tmdi.existingOrNextMounts.back().tapePool = driveState.currentTapePool ? driveState.currentTapePool.value() : "";
...
}
if(driveState.nextMountType == common::dataStructures::MountType::NoMount) continue;
if (activeMountTypes.count(static_cast<int>(driveState.nextMountType))) {
tmdi.existingOrNextMounts.push_back(ExistingMount());
tmdi.existingOrNextMounts.back().type = driveState.nextMountType;
tmdi.existingOrNextMounts.back().tapePool = driveState.nextTapePool ? driveState.nextTapePool.value() : "";
...
}
}
So, if the tape drive has assigned a Next Mount Type
in the database, it will fetch an empty string for the tapepool. As we can see when we see the code in TapeDrivesCatalogueState.cpp
void TapeDrivesCatalogueState::setDriveDown(common::dataStructures::TapeDrive & driveState,
const ReportDriveStatusInputs & inputs) const {
// If we are changing state, then all should be reset.
driveState.sessionId = std::nullopt;
driveState.bytesTransferedInSession = std::nullopt;
driveState.filesTransferedInSession = std::nullopt;
driveState.sessionStartTime = std::nullopt;
driveState.sessionElapsedTime = std::nullopt;
driveState.mountStartTime = std::nullopt;
driveState.transferStartTime = std::nullopt;
driveState.unloadStartTime = std::nullopt;
driveState.unmountStartTime = std::nullopt;
driveState.drainingStartTime = std::nullopt;
driveState.downOrUpStartTime = inputs.reportTime;
driveState.probeStartTime = std::nullopt;
driveState.cleanupStartTime = std::nullopt;
driveState.shutdownTime = std::nullopt;
driveState.lastModificationLog = common::dataStructures::EntryLog("NO_USER", driveState.host, inputs.reportTime);
driveState.mountType = common::dataStructures::MountType::NoMount;
driveState.driveStatus = common::dataStructures::DriveStatus::Down;
driveState.desiredUp = false;
driveState.desiredForceDown = false;
driveState.currentVid = "";
driveState.currentTapePool = "";
driveState.currentVo = "";
driveState.currentActivity = std::nullopt;
if (inputs.reason) driveState.reasonUpDown = inputs.reason;
}
The fix to this behavior is to set nextMountType
to MountType::NoMount
when the tape drive is set down.
Edited by Jorge Camarero Vera