Skip to content

Set next mount type to NoMount when a tape drive is set down.

Yesterday night we have again the error:

Scheduler::sortAndGetTapesForMountInfo(): for Existing or Next Mounts, tapePool is an empty string.

It was triggered during the by tape server tpsrv473:

[1694397523.709937000] Sep 11 03:58:43.709937 tpsrv473.cern.ch cta-taped: LVL="ERROR" PID="34883" TID="34883" MSG="Error while scheduling new mount. Putting the drive down. Stack trace follows." thread="MainThread" tapeDrive="I4600231"
[1694681650.405885000] Sep 14 10:54:10.405885 tpsrv473.cern.ch cta-taped: LVL="INFO" PID="21808" TID="21808" MSG="In DriveHandler::shutdown(): starting cleaner." tapeVid="I76707" tapeDrive="I4600231" sessionState="Fatal" sessionType="Archive" 
[1694681650.406292000] Sep 14 10:54:10.406292 tpsrv473.cern.ch cta-taped: LVL="INFO" PID="21808" TID="21808" MSG="Cleaner set process capabilities for using tape" capabilities="= cap_sys_rawio+ep" 
[1694681650.408947000] Sep 14 10:54:10.408947 tpsrv473.cern.ch cta-taped: LVL="INFO" PID="21808" TID="21808" MSG="Cleaner waiting for drive to be ready" tapeVid="I76707" tapeDrive="I4600231" waitMediaInDriveTimeout="300" 
[1694681650.417466000] Sep 14 10:54:10.417466 tpsrv473.cern.ch cta-taped: LVL="INFO" PID="21808" TID="21808" MSG="Cleaner detected drive is ready" tapeVid="I76707" tapeDrive="I4600231" waitMediaInDriveTimeout="300" 
[1694681650.423002000] Sep 14 10:54:10.423002 tpsrv473.cern.ch cta-taped: LVL="INFO" PID="21808" TID="21808" MSG="Cleaner rewinding tape" tapeVid="I76707" tapeDrive="I4600231" 
[1694681650.437494000] Sep 14 10:54:10.437494 tpsrv473.cern.ch cta-taped: LVL="ERROR" PID="21808" TID="21808" MSG="In CleanerSession::exceptionThrowingExecute(), failed to clean the Drive with a tape mounted. Disabling the tape." tapeVid="I76707" tapeDrive="I4600231" logicalLibrary="
IBM460" host="tpsrv473" exceptionMsg="Failed ST ioctl (MTREW) in DriveGeneric::rewind Errno=5: Input/output error" 
[1694681650.441211000] Sep 14 10:54:10.441211 tpsrv473.cern.ch cta-taped: LVL="ERROR" PID="21808" TID="21808" MSG="Cleaner failed, the drive is going down." tapeVid="I76707" tapeDrive="I4600231" message="Failed ST ioctl (MTREW) in DriveGeneric::rewind Errno=5: Input/output error" 
[1694681650.444210000] Sep 14 10:54:10.444210 tpsrv473.cern.ch cta-taped: LVL="INFO" PID="21808" TID="21808" MSG="In Scheduler::setDesiredDriveState(): success." drive="I4600231" up="down" force="no" reason="[cta-taped] ERROR Cleaner failed. Failed ST ioctl (MTREW) in DriveGeneric::rewind Errno=5: Input/output error" comment="" schedulerDbTime="0.001090" 
[1694681650.446967000] Sep 14 10:54:10.446967 tpsrv473.cern.ch cta-taped: LVL="INFO" PID="21808" TID="21808" MSG="In Agent::removeAndUnregisterSelf(): Removed agent object." agentObject="DriveHandlerShutdown-I4600231-tpsrv473.cern.ch-21808-20230914-10:54:10-0" 
[1694681650.457444000] Sep 14 10:54:10.457444 tpsrv473.cern.ch cta-taped: LVL="INFO" PID="21808" TID="21808" MSG="Signaled shutdown to subprocess handler" SubprocessName="drive:I4600231" ShutdownComplete="1"

It has an error scheduling the nextMountType when a tape is mounted, and when during the cleaner it tries to rewind it fails.

In the code of OStore.cpp we have:

std::set<int> activeDriveStatuses = {
    (int)cta::common::dataStructures::DriveStatus::Starting,
    (int)cta::common::dataStructures::DriveStatus::Mounting,
    (int)cta::common::dataStructures::DriveStatus::Transferring,
    (int)cta::common::dataStructures::DriveStatus::Unloading,
    (int)cta::common::dataStructures::DriveStatus::Unmounting,
    (int)cta::common::dataStructures::DriveStatus::DrainingToDisk,
    (int)cta::common::dataStructures::DriveStatus::CleaningUp };
  std::set<int> activeMountTypes = {
    (int)cta::common::dataStructures::MountType::ArchiveForUser,
    (int)cta::common::dataStructures::MountType::ArchiveForRepack,
    (int)cta::common::dataStructures::MountType::Retrieve,
    (int)cta::common::dataStructures::MountType::Label };
  for (const auto& driveState : driveStates) {
    if (activeDriveStatuses.count(static_cast<int>(driveState.driveStatus))) {
      tmdi.existingOrNextMounts.push_back(ExistingMount());
      tmdi.existingOrNextMounts.back().type = driveState.mountType;
      tmdi.existingOrNextMounts.back().tapePool = driveState.currentTapePool ? driveState.currentTapePool.value() : "";
      ...
    }
    if(driveState.nextMountType == common::dataStructures::MountType::NoMount) continue;
    if (activeMountTypes.count(static_cast<int>(driveState.nextMountType))) {
      tmdi.existingOrNextMounts.push_back(ExistingMount());
      tmdi.existingOrNextMounts.back().type = driveState.nextMountType;
      tmdi.existingOrNextMounts.back().tapePool = driveState.nextTapePool ? driveState.nextTapePool.value() : "";
      ...
    }
  }

So, if the tape drive has assigned a Next Mount Type in the database, it will fetch an empty string for the tapepool. As we can see when we see the code in TapeDrivesCatalogueState.cpp

void TapeDrivesCatalogueState::setDriveDown(common::dataStructures::TapeDrive & driveState,
  const ReportDriveStatusInputs & inputs) const {
  // If we are changing state, then all should be reset.
  driveState.sessionId = std::nullopt;
  driveState.bytesTransferedInSession = std::nullopt;
  driveState.filesTransferedInSession = std::nullopt;
  driveState.sessionStartTime = std::nullopt;
  driveState.sessionElapsedTime = std::nullopt;
  driveState.mountStartTime = std::nullopt;
  driveState.transferStartTime = std::nullopt;
  driveState.unloadStartTime = std::nullopt;
  driveState.unmountStartTime = std::nullopt;
  driveState.drainingStartTime = std::nullopt;
  driveState.downOrUpStartTime = inputs.reportTime;
  driveState.probeStartTime = std::nullopt;
  driveState.cleanupStartTime = std::nullopt;
  driveState.shutdownTime = std::nullopt;
  driveState.lastModificationLog = common::dataStructures::EntryLog("NO_USER", driveState.host, inputs.reportTime);
  driveState.mountType = common::dataStructures::MountType::NoMount;
  driveState.driveStatus = common::dataStructures::DriveStatus::Down;
  driveState.desiredUp = false;
  driveState.desiredForceDown = false;
  driveState.currentVid = "";
  driveState.currentTapePool = "";
  driveState.currentVo = "";
  driveState.currentActivity = std::nullopt;
  if (inputs.reason) driveState.reasonUpDown = inputs.reason;
}

The fix to this behavior is to set nextMountType to MountType::NoMount when the tape drive is set down.

Edited by Jorge Camarero Vera