Drive goes up and free after an EndOfSessionWithError
In the Possible SCSI errors on tpsrv311 I3601444 operation ticket, we spotted that a tape could not be mounted because of a timeout error. BUT, the tape was mounted in the drive.
The problem that occured is that another tapeserver (tpsrv304) tried to mount the tape right after this one failed to be mounted.
Why did this happen ? The tape should not have been scheduled as it was located in the drive tpsrv311...
In the reporting of the failed session:
//------------------------------------------------------------------------------
//ReportEndofSessionWithErrors::execute
//------------------------------------------------------------------------------
void MigrationReportPacker::ReportEndofSessionWithErrors::execute(MigrationReportPacker& reportPacker){
reportPacker.m_continue=false;
reportPacker.m_lc.log(cta::log::DEBUG, "In MigrationReportPacker::ReportEndofSessionWithErrors::execute(): reporting session complete.");
reportPacker.m_archiveMount->complete();
Here is the body of m_archiveMount->complete()
method:
//------------------------------------------------------------------------------
// complete
//------------------------------------------------------------------------------
void cta::ArchiveMount::complete() {
// Just set the session as complete in the DB.
m_dbMount->complete(time(NULL));
// and record we are done with the mount
m_sessionRunning = false;
}
Here is the body of the m_dbMount->complete(time(NULL));
method:
//------------------------------------------------------------------------------
// OStoreDB::ArchiveMount::complete()
//------------------------------------------------------------------------------
void OStoreDB::ArchiveMount::complete(time_t completionTime) {
// When the session is complete, we can reset the status of the drive.
// Tape will be implicitly released
// Reset the drive state.
common::dataStructures::DriveInfo driveInfo;
driveInfo.driveName=mountInfo.drive;
driveInfo.logicalLibrary=mountInfo.logicalLibrary;
driveInfo.host=mountInfo.host;
ReportDriveStatusInputs inputs;
inputs.mountType = common::dataStructures::MountType::NoMount;
inputs.mountSessionId = mountInfo.mountId;
inputs.reportTime = completionTime;
inputs.status = common::dataStructures::DriveStatus::Up;
inputs.vid = mountInfo.vid;
inputs.tapepool = mountInfo.tapePool;
log::LogContext lc(m_oStoreDB.m_logger);
m_oStoreDB.updateDriveStatus(driveInfo, inputs, lc);
}
The problem is that the drive is set back up and free. This tells the scheduler that the tape is no longer being used by this drive. This is why the other tapeserver tpsrv304 tried to mount the tape (and failed because the tape was already in use).
This bug has to be fixed in order to avoid a tapeserver to trigger mount a tape blocked by another tapeserver.