Deadlock in BufferedModule Conditional Waiting
After investigating a stalled simulation for a while I have come to the conclusion that we still have an issue in the threading logic of BufferedModule
, more specifically whenever the buffer is full we have the chance of a deadlock. Reason:
-
BufferedModule
is currently processing event X-1 -
BufferedModule::run_in_order
is called for event X, the buffer is full at this time - Function runs into the
condition_variable.wait()
and waits becauseX != X-1
buffersize == max_size
- Event X-10 is found in buffer fetched from there for processing and
condition_variable.notify_one()
is called. This might or might not be the thread of event X. If not, we have a deadlock because another event has been admitted to the buffer and now all threads are waiting for event X to be processed. - Then Event X-1 is processed, and only now the
next_to_write
is advanced to X which would allow admission to the buffer.
There are two issues:
- We use
notify_one()
, so in many cases we're just waking up a thread with a future event, but not the one we urgently need. - Even with
notify_all()
there might be another thread which is quicker and fills the vacant buffer slot with a future event. The desired event never makes it into the buffer because the first conditionevent_number == next_event_to_write_
becomes true only after thenotify_all()
call.
My solution would be:
diff --git a/src/core/module/Module.cpp b/src/core/module/Module.cpp
index ce476b91a..c615cd560 100644
--- a/src/core/module/Module.cpp
+++ b/src/core/module/Module.cpp
@@ -269,7 +269,12 @@ void BufferedModule::flush_buffered_events() {
// Remove it from buffer
buffered_events_.erase(iter);
- cond_var_.notify_one();
+
+ // Move to the next event
+ next_event_to_write_ = get_next_event(next_event_to_write_ + 1);
+
+ // Notify waiting threads
+ cond_var_.notify_all();
}
LOG(TRACE) << "Writing buffered event " << iter->first << ", " << buffered_events_.size() << " left in buffer";
@@ -282,8 +287,6 @@ void BufferedModule::flush_buffered_events() {
// Process the buffered event
run(event.get());
- // Move to the next event
- next_event_to_write_ = get_next_event(next_event_to_write_ + 1);
}
}
This has the negative side effect that we always wake up all threads which are waiting but I don't see another solution to always and reliably trigger the correct one.
@kwolters opinion?