Improve error reporting when using batch jobs
If possible it should be made clear to the user if the --distributed=driver
task failed (in the monitoring loop, or in postProcess
), but the worker tasks may still have finished successfully (i.e. catch all errors after submission).
-
squeue
/sacct
failures should be caught and ignored (or turned into warnings) -
postProcess
does not need to resolve the file list (grid proxy / SAMADhi errors)
Edited by Pieter David