Skip to content

cta-frontend fix log consistency for workflow logs

When dealing with recent 0-length file queuing on CTA side, it was quite challenging to find all occurrences and follow what happens to one specific eos file (see https://gitlab.cern.ch/cta/operations/-/issues/1538).

What was in the logs

In the frontend looking for "diskFileId":"4395118946":

cta-frontend.log-20241106-1730908801.gz:{"epoch_time":1730889077.433728150,"local_time":"2024-11-06T11:31:17+0100","hostname":"ctaproductionfrontend12","program":"cta-frontend","log_level":"INFO","pid":"2655208","tid":"2655363","message":"In WorkflowEvent::WorkflowEvent(): received event.","user":"eosctapublic@ctaproductionfrontend12","eventType":"CREATE","eosInstance":"eosctapublic","diskFilePath":"/eos/ctapublic/archivetest/digitalmemory/sip-485-1730889073","diskFileId":"4395118946"}
cta-frontend.log-20241106-1730908801.gz:{"epoch_time":1730889077.435346898,"local_time":"2024-11-06T11:31:17+0100","hostname":"ctaproductionfrontend12","program":"cta-frontend","log_level":"INFO","pid":"2655208","tid":"2655363","message":"In WorkflowEvent::processCREATE(): assigning new archive file ID.","user":"eosctapublic@ctaproductionfrontend12","diskFileId":"4395118946","diskFilePath":"/eos/ctapublic/archivetest/digitalmemory/sip-485-1730889073","fileId":4842720741,"schedulerTime":0.001478}
cta-frontend.log-20241106-1730908801.gz:{"epoch_time":1730889077.456527161,"local_time":"2024-11-06T11:31:17+0100","hostname":"ctaproductionfrontend12","program":"cta-frontend","log_level":"INFO","pid":"2655208","tid":"3053082","message":"In WorkflowEvent::WorkflowEvent(): received event.","user":"eosctapublic@ctaproductionfrontend12","eventType":"CLOSEW","eosInstance":"eosctapublic","diskFilePath":"/eos/ctapublic/archivetest/digitalmemory/sip-485-1730889073","diskFileId":"4395118946"}

But for the CLOSEW "diskFileId" is missing this is a bug that must be fixed we only can use fileId which is our archiveId very confusing that we are changing names:

{"epoch_time":1730889077.435223870,"local_time":"2024-11-06T11:31:17+0100","hostname":"ctaproductionfrontend12","program":"cta-frontend","log_level":"INFO","pid":"2655208","tid":"2655363","message":"Checked request and got next archive file ID","user":"eosctapublic@ctaproductionfrontend12","instanceName":"eosctapublic","username":"oais","usergroup":"def-cg","storageClass":"test_digitalmemory","fileId":4842720741,"catalogueTime":0.001297,"schedulerDbTime":0.001297}
{"epoch_time":1730889077.435346898,"local_time":"2024-11-06T11:31:17+0100","hostname":"ctaproductionfrontend12","program":"cta-frontend","log_level":"INFO","pid":"2655208","tid":"2655363","message":"In WorkflowEvent::processCREATE(): assigning new archive file ID.","user":"eosctapublic@ctaproductionfrontend12","diskFileId":"4395118946","diskFilePath":"/eos/ctapublic/archivetest/digitalmemory/sip-485-1730889073","fileId":4842720741,"schedulerTime":0.001478}
{"epoch_time":1730889077.457902444,"local_time":"2024-11-06T11:31:17+0100","hostname":"ctaproductionfrontend12","program":"cta-frontend","log_level":"INFO","pid":"2655208","tid":"3053082","message":"In WorkflowEvent::processCLOSEW(): ignoring zero-length file.","user":"eosctapublic@ctaproductionfrontend12","requesterInstance":"eosctafst0204.cern.ch:fst.1113","fileId":4842720741}

What we should improve

We need to be consistent here with the key names in logs that we have on our side.

We have:

  • in logs eosInstance and instanceName VS in CTA diskInstance
  • in logs fileId VS in CTA archiveId

We need to consistently log eos communications with eos side attributes: eosInstance and fileId must be in all messages related to workflow events that refer to one file in EOS side.

Looking in various messages to resolve fileId from eosInstance and diskFileId is not sustainable for operators, especially if the key names are changing.