Skip to content

Allow `eos stagerrm` to succeed when removing replicas on `drain+failed` FSes

Description

See related ops issue: https://gitlab.cern.ch/cta/operations/-/issues/865

Files on file systems with the status drain+failed or draindead cannot by checked by the MGM. This causes the stagerrm to fail its check on the FST with unable to check for existence of file.

220912 14:14:29 time=1662984869.619412 func=DoIt                     level=ERROR logid=73b524a2-3294-11ed-8e13-40a6b71da624 unit=mgm@eosctafst0205.cern.ch:1094 tid=00007fcff0aae700 source=DrainTransferJob:153           tident=<service> sec=      uid=0 gid=0 name= geo="" src=root://eosctafst0004.cern.ch:1095//replicate:520794ad dst=root://p06636710v34328.cern.ch:1095//replicate:520794ad logid=73b55206-3294-11ed-b54f-40a6b71da624 tpc_err=[ERROR] Server responded with an error: [3007] Unable to open - unable to check for existence of file  &mgm.access=read&mgm.lid=1048578&mgm.cid=1376206159&mgm.ruid=1&mgm.rgid=1&mgm.uid=1&mgm.gid=1&mgm.path=/eos/ctapublicdisk/archive/compass/generalprod/testcoral/hadron2009t86/megaDST/megaDST-81044-0-7.root.012&mgm.manager=eosctafst0205.cern.ch:1094&mgm.fid=520794ad&mgm.sec=sss|eos|eos|-|-|-|-|eos/drain&mgm.localprefix=/data13&mgm.fsid=99&mgm.sourcehostport=eosctafst0004.cern.ch:1095&eos.app=drain&eos.ruid=0&eos.rgid=0&cap.valid=1662988469; input/output error
220912 14:15:03 time=1662984903.646171 func=DoIt                     level=ERROR logid=87fd43e0-3294-11ed-8e13-40a6b71da624 unit=mgm@eosctafst0205.cern.ch:1094 tid=00007fcff1ab0700 source=DrainTransferJob:153           tident=<service> sec=      uid=0 gid=0 name= geo="" src=root://eosctafst0004.cern.ch:1095//replicate:520794ad dst=root://eosctafst0004.cern.ch:1095//replicate:520794ad logid=87fd6ad2-3294-11ed-aa9f-40a6b71da624 tpc_err=[ERROR] Server responded with an error: [3007] Unable to open - unable to check for existence of file  &mgm.access=read&mgm.lid=1048578&mgm.cid=1376206159&mgm.ruid=1&mgm.rgid=1&mgm.uid=1&mgm.gid=1&mgm.path=/eos/ctapublicdisk/archive/compass/generalprod/testcoral/hadron2009t86/megaDST/megaDST-81044-0-7.root.012&mgm.manager=eosctafst0205.cern.ch:1094&mgm.fid=520794ad&mgm.sec=sss|eos|eos|-|-|-|-|eos/drain&mgm.localprefix=/data13&mgm.fsid=99&mgm.sourcehostport=eosctafst0004.cern.ch:1095&eos.app=drain&eos.ruid=0&eos.rgid=0&cap.valid=1662988503; input/output error

Proposed solution

stagerrm should succeed if the FS is with status drain+failed or draindead. This means that the file will disappear eventually, so the end result is the same for the stagerrm command.