StalledEventMonitor not available for multithreaded jobs

StalledEventMonitor is a practical small service that can detect when an event is taking too much time and, if requested, kill the process when that happens.

Unfortunately it relies on the BeginEvent incident, which is not used in multithreaded jobs... and even if it was used it would not help for StalledEventMonitor because it doesn't say which event slot is being started.

What we need is a mechanism to keep track of all events in flight to see if one is taking too much time. The mechanism could be integrated in the scheduler or we can have a protocol between the scheduler and the StalledEventMonitor, so that the same protocol can be used by different schedulers.

/cc @gcorti @kreps

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information