Skip to content

AthenaMPI Fixes

This fixes a few things in AthenaMPI:

  1. For some reason using only the service name without the type doesn't seem to work anymore.
  2. The event numbers in the log now match the event index in the job. Previously they would reflect the event index of the previous event processed by that worker, which is very confusing.
  3. AthenaMPI is now tolerant of the occasional StatusCode::FAILURE from an event, continuing the job. I've checked the output files (and with @gemmeren) and as far as I can tell we get a valid output file without the failed event in it. The mpilog mechanism means we can figure out which events failed, and which input files those events came from, so with the appropriate support from the grid we can re-run just those events. Since we envision running very large (hundreds of thousands to millions of event) jobs with AthenaMPI simply failing the job or stopping it early isn't a great solution.

Merge request reports

Loading