batch.sh command sometimes fails due to i/o error on eos
I use the batch.sh script to submit jobs on the lxplus farm. Each job reads data files stored in eos under
/eos/experiment/na64/data/cdr
Typically, I configure the batch job to have one job per data acquisition run.
Some of the jobs usually fail reporting the following error for ALL files being processed:
du: cannot access /eos/experiment/na64/data/cdr/cdr01001-008339.dat: Input/output error
I think this is an issue possibly related to the eos file system and the way we access it in our jobs? If I re-launch jobs locally on lxplus, files are properly accessed.