Resume job monitoring when bambooRun exits early
When launching distributed tasks, it can happen that bambooRun
exits before all jobs have finished (either because of an unexpected user interface reboot, user mistake, bug in bamboo etc...). It would be nice to be able to resume the job monitoring, and the subsequent printing of failed job recovery commands and postprocessing etc., based on an existing batch directory.
Of course this would only work if the sample list and plots have not changed.
I'm not sure if the job/tasks list can just be "re-created", just without writing any batch files and without submitting any jobs, or if the way files are distributed across jobs is non-reproducible (mainly because of dict
usage). If it's the latter, it would probably require to read and parse the batch files created in the previous run (more work), or write additional information (e.g. a YAML file in the batch directory) that can be easily read when relaunching the monitoring.