Always delete 0-sized files from EOS namespace
Summary
This is a dev ticket to fix the bug in EOS.
For the directories that have WFE enabled (most of directories in the CTA instances), deletion of the file with 0 size and no sys.archive.file_id
should ignore any error in workflow
part and continue with removing the file from the EOS namespace.
- Related ops ticket: https://gitlab.cern.ch/cta/operations/-/issues/1057
- Related EOS Jira ticket: https://its.cern.ch/jira/browse/EOS-5696
Steps to reproduce
- Create a file in EOSCTA in a directory with workflow configured
- Change file size to 0 in EOS metadata
- Remove
sys.archive.file_id
attribute - Delete the file (
xrdfs rm
oreos rm
)
What is the current bug behavior?
Deletion fails with the following message, file remains in the namespace
230427 10:12:44 ^[[49;32mINFO ^[[0m [11075/02348] ntofdaq ::open op=write trunc=512 path=/eos/ctapublicdisk/archive/ntof/2023/EAR1/newC6D6/115360/stream1/run115360_124_s1.raw.finished info=cks.type=adler32&eos.space=spinners&svcClass=ntof
230427 10:12:44 ^[[49;32mINFO ^[[0m [11075/02348] ntofdaq ::open msg="rewrote symlinks" sym-path=/eos/ctapublicdisk/archive/ntof/2023/EAR1/newC6D6/115360/stream1/run115360_124_s1.raw.finished realpath=/eos/ctapublicdisk/archive/ntof/2023/EAR1/newC6D6/115360/stream1/run115360_124_s1.raw.finished
230427 10:12:44 ^[[49;32mINFO ^[[0m [11075/02348] ntofdaq ::open acl=1 r=1 w=1 wo=0 egroup=0 shared=0 mutable=1 facl=0
230427 10:12:44 ^[[49;32mINFO ^[[0m [11075/02348] ntofdaq ::_rem path=/eos/ctapublicdisk/archive/ntof/2023/EAR1/newC6D6/115360/stream1/run115360_124_s1.raw.finished vid.uid=11075 vid.gid=2348
230427 10:12:44 ^[[49;32mINFO ^[[0m [11075/02348] ntofdaq ::_rem acl=u:11075:rwxp+d,u:95759:rwxp+d,g:2348:rxp,u:98119:rwxp,z:!u!d mutable=1
230427 10:12:44 ^[[49;32mINFO ^[[0m [11075/02348] ntofdaq ::_rem acl=1 r=1 w=1 wo=0 egroup=0 delete=1 not-delete=0 mutable=1
230427 10:12:44 ^[[49;32mINFO ^[[0m [11075/02348] ntofdaq ::_rem got quota node=0
230427 10:12:44 ^[[49;32mINFO ^[[0m [11075/02348] ntofdaq ::_rem unlinking from view /eos/ctapublicdisk/archive/ntof/2023/EAR1/newC6D6/115360/stream1/run115360_124_s1.raw.finished
230427 10:12:44 ^[[49;31mERROR^[[0m [00099/00099] - ::Create msg ="Caught an unexpected exception: Attribute: sys.archive.file_id not found"
230427 10:12:44 ^[[49;32mINFO ^[[0m [11075/02348] ntofdaq ::_rem msg="workflow trigger returned" retc=125 errno=0
230427 10:12:44 ^[[49;32mINFO ^[[0m [00000/00000] ::PurgeVersion version-dir=/eos/ctapublicdisk/archive/ntof/2023/EAR1/newC6D6/115360/stream1/.sys.v#.run115360_124_s1.raw.finished/ max-versions=0
230427 10:12:44 ^[[49;32mINFO ^[[0m [00000/00000] ::PurgeVersion listrc=-1 max-version=0
230427 10:12:44 ^[[49;32mINFO ^[[0m [11075/02348] ntofdaq ::_rem msg="deleted" can-recycle=0 path=/eos/ctapublicdisk/archive/ntof/2023/EAR1/newC6D6/115360/stream1/run115360_124_s1.raw.finished owner.uid=11075 owner.gid=2348 vid.uid=11075 vid.gid=2348
What is the expected correct behavior?
The deletion of the file from the namespace should be executed regardless of the outcome of workflow.Trigger()
.
Zero-sized file with no sys.archive.file_id
should be unconditionally removed.
Possible causes
See explanation by @ccaffy (https://gitlab.cern.ch/cta/operations/-/issues/1057#note_6703351):
Here is the code where this is located: https://gitlab.cern.ch/dss/eos/-/blob/eos4/mgm/XrdMgmOfs/Rm.cc#L302
This line throws an exception and the rest of the function is not ran. The exception is catched here: https://gitlab.cern.ch/dss/eos/-/blob/eos4/mgm/XrdMgmOfs/Rm.cc#L357