Transient XRootD/storage access issue
When running bamboo workflows over many central files (particularly on HTCondor), transient XRootD access failures (~1% of files) happens. This one is also related: #113
Failures can occur:
- During TChain initialization (causing crashes)
- During event processing (causing silent skips, bug)
Example error:
INFO:bamboo.workflow:Starting to fill plots (and skims)
Error in TNetXNGFile::Open: \[ERROR\] Server responded with an error: \[3000\] Unable to open - cannot determine the prefix path to use for the given filesystem id /store/mc/Run3Summer22NanoAODv12/GluGlutoHHto2B2WtoLNu2Q_kl-2p45_kt-1p00_c2-0p00_LHEweights_TuneCP5_13p6TeV_powheg-pythia8/NANOAODSIM/130X_mcRun3_2022_realistic_v5-v2/2820000/603a1208-5558-42ab-92a1-0712b242473d.root; invalid argument
INFO:bamboo.workflow:Plots finished in 32.53s, max RSS: 1061.02MB. 231 histograms, 0 skims
- Location: lxplus + HTCondor
- Affects both local and distributed runs
- Not specific to particular files/datasets
- Not commonly reported by Slurm users
Issue reported by @scrossle.
Edited by Khawla Jaffel