Always delete 0-sized files from EOS namespace
Summary
This is a dev ticket to fix the bug in EOS.
For the directories that have WFE enabled (most of directories in the CTA instances), deletion of the file with 0 size and no sys.archive.file_id should ignore any error in workflow part and continue with removing the file from the EOS namespace.
- Related ops ticket: https://gitlab.cern.ch/cta/operations/-/issues/1057
- Related EOS Jira ticket: https://its.cern.ch/jira/browse/EOS-5696
Steps to reproduce
- Create a file in EOSCTA in a directory with workflow configured
- Change file size to 0 in EOS metadata
- Remove sys.archive.file_idattribute
- Delete the file (xrdfs rmoreos rm)
What is the current bug behavior?
Deletion fails with the following message, file remains in the namespace
230427 10:12:44 ^[[49;32mINFO ^[[0m [11075/02348]          ntofdaq ::open             op=write trunc=512 path=/eos/ctapublicdisk/archive/ntof/2023/EAR1/newC6D6/115360/stream1/run115360_124_s1.raw.finished info=cks.type=adler32&eos.space=spinners&svcClass=ntof
230427 10:12:44 ^[[49;32mINFO ^[[0m [11075/02348]          ntofdaq ::open             msg="rewrote symlinks" sym-path=/eos/ctapublicdisk/archive/ntof/2023/EAR1/newC6D6/115360/stream1/run115360_124_s1.raw.finished realpath=/eos/ctapublicdisk/archive/ntof/2023/EAR1/newC6D6/115360/stream1/run115360_124_s1.raw.finished
230427 10:12:44 ^[[49;32mINFO ^[[0m [11075/02348]          ntofdaq ::open             acl=1 r=1 w=1 wo=0 egroup=0 shared=0 mutable=1 facl=0
230427 10:12:44 ^[[49;32mINFO ^[[0m [11075/02348]          ntofdaq ::_rem             path=/eos/ctapublicdisk/archive/ntof/2023/EAR1/newC6D6/115360/stream1/run115360_124_s1.raw.finished vid.uid=11075 vid.gid=2348
230427 10:12:44 ^[[49;32mINFO ^[[0m [11075/02348]          ntofdaq ::_rem             acl=u:11075:rwxp+d,u:95759:rwxp+d,g:2348:rxp,u:98119:rwxp,z:!u!d mutable=1
230427 10:12:44 ^[[49;32mINFO ^[[0m [11075/02348]          ntofdaq ::_rem             acl=1 r=1 w=1 wo=0 egroup=0 delete=1 not-delete=0 mutable=1
230427 10:12:44 ^[[49;32mINFO ^[[0m [11075/02348]          ntofdaq ::_rem             got quota node=0
230427 10:12:44 ^[[49;32mINFO ^[[0m [11075/02348]          ntofdaq ::_rem             unlinking from view /eos/ctapublicdisk/archive/ntof/2023/EAR1/newC6D6/115360/stream1/run115360_124_s1.raw.finished
230427 10:12:44 ^[[49;31mERROR^[[0m [00099/00099]                - ::Create           msg ="Caught an unexpected exception: Attribute: sys.archive.file_id not found"
230427 10:12:44 ^[[49;32mINFO ^[[0m [11075/02348]          ntofdaq ::_rem             msg="workflow trigger returned" retc=125 errno=0
230427 10:12:44 ^[[49;32mINFO ^[[0m [00000/00000]                  ::PurgeVersion     version-dir=/eos/ctapublicdisk/archive/ntof/2023/EAR1/newC6D6/115360/stream1/.sys.v#.run115360_124_s1.raw.finished/ max-versions=0
230427 10:12:44 ^[[49;32mINFO ^[[0m [00000/00000]                  ::PurgeVersion     listrc=-1 max-version=0
230427 10:12:44 ^[[49;32mINFO ^[[0m [11075/02348]          ntofdaq ::_rem             msg="deleted" can-recycle=0 path=/eos/ctapublicdisk/archive/ntof/2023/EAR1/newC6D6/115360/stream1/run115360_124_s1.raw.finished owner.uid=11075 owner.gid=2348 vid.uid=11075 vid.gid=2348What is the expected correct behavior?
The deletion of the file from the namespace should be executed regardless of the outcome of workflow.Trigger().
Zero-sized file with no sys.archive.file_id should be unconditionally removed.
Possible causes
See explanation by @ccaffy (https://gitlab.cern.ch/cta/operations/-/issues/1057#note_6703351):
Here is the code where this is located: https://gitlab.cern.ch/dss/eos/-/blob/eos4/mgm/XrdMgmOfs/Rm.cc#L302
This line throws an exception and the rest of the function is not ran. The exception is catched here: https://gitlab.cern.ch/dss/eos/-/blob/eos4/mgm/XrdMgmOfs/Rm.cc#L357Edited  by Joao Afonso