Improve logging of 'Tape session finished' message
Problem
As discussed in https://gitlab.cern.ch/cta/operations/-/issues/1289, the TAS Tape Alerting System has no way to know if a reported tape session failure happened due to an unexpected error or due to a normal workflow event (which should not trigger an alert).
Some examples of workflow events that do not trigger a DCS alarm:
- Empty mount.
- Retrieve queue is sleeping.
- Tape transfer process killed by an operator.
- ...
Objective
As proposed in https://gitlab.cern.ch/cta/operations/-/issues/1289, add a failureReason
field to the "Tape session finished" message, so that the TAS system can easily identify which failures warrant an alarm:
Example:
- Empty mount:
-
failureReason
set totapeNotMountedemptyQueue
-
- Retrieve queue is sleeping:
-
failureReason
set tosleepingRetrieveQueue
-
- Tape transfer process killed by an operator.
-
failureReason
set tooperatorTerminated
-
- Actual errors that warrant an alarm:
-
failureReason
set tounnexpected
??
-
Edited by Joao Afonso