Skip to content
Snippets Groups Projects

Introduce a new way to check timeout on events

Merged Marco Clemencic requested to merge add-alg-timeout-monitor-to-scheduler into master

The mechanism provided by StalledEventMonitor and WatchdogThread are based on the BeginEvent incident, which cannot be used with the same guarantees in multi-threaded jobs.

This MR replaces WatchdogThread with an RAII object that can be stored in the TES and that periodically executes a callback: Gaudi::Utils::PeriodicAction.

Thanks to PeriodicAction it is possible to have an algorithm that adds to the TES a timeout checker to each event so that multiple events in flight can be checked independently and reliably. Gaudi::EventWatchdogAlg does exactly this with the same features available in StalledEventMonitor (log messages, stack trace, abort) and some improvements (like printing the EventContext of the hanging event).

I tried to preserve as much as possible backward compatibility:

  • StalledEventMonitor and WatchdogThread are still available, but should not be used
  • the option ApplicationMgr.StalledEventMonitoring now adds Gaudi::EventWatchdogAlg to the beginning of TopAlg, but it is better to use directly the algorithm
  • Gaudi::EventWatchdogAlg uses the same property names and types of StalledEventMonitor (it also uses the configuration of StalledEventMonitor from the JobOptionsSvc)

Content:

  • modernization of WatchdogThread (not really needed, but I do not want to throw it away as I did it during the development)
  • make GaudiTesting::SleepyAlg re-entrant (to properly test the watchdog in multi-threading)
  • add Gaudi::Utils::PeriodicAction as a helper to periodically invoke a callback
  • add Gaudi::EventWatchdogAlg as a replacement for StalledEventMonitor
  • add a test to validate Gaudi::EventWatchdogAlg works in a multi-threaded job
  • add an example to explain how to properly use Gaudi::EventWatchdogAlg

Closes #287 (closed)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading