cta-frontend fix log consistency for workflow logs
When dealing with recent 0-length file queuing on CTA side, it was quite challenging to find all occurrences and follow what happens to one specific eos file (see https://gitlab.cern.ch/cta/operations/-/issues/1538).
What was in the logs
In the frontend looking for "diskFileId":"4395118946"
:
cta-frontend.log-20241106-1730908801.gz:{"epoch_time":1730889077.433728150,"local_time":"2024-11-06T11:31:17+0100","hostname":"ctaproductionfrontend12","program":"cta-frontend","log_level":"INFO","pid":"2655208","tid":"2655363","message":"In WorkflowEvent::WorkflowEvent(): received event.","user":"eosctapublic@ctaproductionfrontend12","eventType":"CREATE","eosInstance":"eosctapublic","diskFilePath":"/eos/ctapublic/archivetest/digitalmemory/sip-485-1730889073","diskFileId":"4395118946"}
cta-frontend.log-20241106-1730908801.gz:{"epoch_time":1730889077.435346898,"local_time":"2024-11-06T11:31:17+0100","hostname":"ctaproductionfrontend12","program":"cta-frontend","log_level":"INFO","pid":"2655208","tid":"2655363","message":"In WorkflowEvent::processCREATE(): assigning new archive file ID.","user":"eosctapublic@ctaproductionfrontend12","diskFileId":"4395118946","diskFilePath":"/eos/ctapublic/archivetest/digitalmemory/sip-485-1730889073","fileId":4842720741,"schedulerTime":0.001478}
cta-frontend.log-20241106-1730908801.gz:{"epoch_time":1730889077.456527161,"local_time":"2024-11-06T11:31:17+0100","hostname":"ctaproductionfrontend12","program":"cta-frontend","log_level":"INFO","pid":"2655208","tid":"3053082","message":"In WorkflowEvent::WorkflowEvent(): received event.","user":"eosctapublic@ctaproductionfrontend12","eventType":"CLOSEW","eosInstance":"eosctapublic","diskFilePath":"/eos/ctapublic/archivetest/digitalmemory/sip-485-1730889073","diskFileId":"4395118946"}
But for the CLOSEW
"diskFileId"
is missing this is a bug that must be fixed we only can use fileId
which is our archiveId
very confusing that we are changing names:
{"epoch_time":1730889077.435223870,"local_time":"2024-11-06T11:31:17+0100","hostname":"ctaproductionfrontend12","program":"cta-frontend","log_level":"INFO","pid":"2655208","tid":"2655363","message":"Checked request and got next archive file ID","user":"eosctapublic@ctaproductionfrontend12","instanceName":"eosctapublic","username":"oais","usergroup":"def-cg","storageClass":"test_digitalmemory","fileId":4842720741,"catalogueTime":0.001297,"schedulerDbTime":0.001297}
{"epoch_time":1730889077.435346898,"local_time":"2024-11-06T11:31:17+0100","hostname":"ctaproductionfrontend12","program":"cta-frontend","log_level":"INFO","pid":"2655208","tid":"2655363","message":"In WorkflowEvent::processCREATE(): assigning new archive file ID.","user":"eosctapublic@ctaproductionfrontend12","diskFileId":"4395118946","diskFilePath":"/eos/ctapublic/archivetest/digitalmemory/sip-485-1730889073","fileId":4842720741,"schedulerTime":0.001478}
{"epoch_time":1730889077.457902444,"local_time":"2024-11-06T11:31:17+0100","hostname":"ctaproductionfrontend12","program":"cta-frontend","log_level":"INFO","pid":"2655208","tid":"3053082","message":"In WorkflowEvent::processCLOSEW(): ignoring zero-length file.","user":"eosctapublic@ctaproductionfrontend12","requesterInstance":"eosctafst0204.cern.ch:fst.1113","fileId":4842720741}
What we should improve
We need to be consistent here with the key names in logs that we have on our side.
We have:
- in logs
eosInstance
andinstanceName
VS in CTAdiskInstance
- in logs
fileId
VS in CTAarchiveId
We need to consistently log eos communications with eos side attributes: eosInstance
and fileId
must be in all messages related to workflow events that refer to one file in EOS side.
Looking in various messages to resolve fileId
from eosInstance
and diskFileId
is not sustainable for operators, especially if the key names are changing.