Skip to content

Improve archival workflow with CTA+EOS

Problem

The FTS/EOS team would like to have more information about the archival workflow of files to CTA.

In particular:


1. Know if a CLOSEW event has been correctly processed by CTA.

Seems to be possible in 2 ways:

  • If CTA succeeded in processing the CLOSEW event: by checking the sys.cta.objectstore.id xattr on the file.
  • If CTA failed to process the CLOSEW event: a sync::archive_failed event would be triggered on EOS, which will fill the sys.archive.error xattr with the error message:

2. Know when a file failed to get archived.

This is also already implemented with the sync::archive_failed event, using a callback URL sent from EOS to CTA:


3. Know when a file timed out during archival.

No way to fix this.
The archive request stays on the Scheduler DB queue until it's popped and served.


4. A way to re-trigger the CLOSEW event without having to transfer the file another time.

This is similar to the issue faced by dCache here, where it tries to trigger more than one CLOSEW event for the same file:

This is not fixed, as it's not the expected EOS behaviour.
We can discuss allowing multiple CLOSEW to work idepotendly, but it remains to be decided how a client can trigger a CLOSEW on a pre-existing file on EOS.

Edited by Joao Afonso