Skip to content

[ATR-20148] More MT safety for TrigCostMonitor MT

Tim Martin requested to merge tamartin/athena:costMonRace into master

I was fortunate to spot a TrigCostMonitorMT crash with --threads=4 in a unit test.

I have been unable to reproduce it, but on further inspection I identified a small possible hole in the existing logic.

m_eventMonitored[ context.slot() ] = false at the start of endEvent is supposed to prevent further writes to the concurrent map, allowing it to be iterated over. However this does not protect against threads calling processAlg which have already gotten past the isMonitoredEvent check and are in the process of executing the code which inserts into the map.

This will invalidate the iteration over the whole map in endEvent

Solution: Add a std::shared_mutex per slot. The processAlg call obtains a std::shared_lock as we want to be able to have many threads writing in to this. The endEvent call obtains a std::unique_lock after having set eventMonitored = flase and hence is assured that all currently-executing calls to processAlg will have been given time to complete, and that no new ones may start.

Only then is the full table iterated over.

A similar consideration is given to startEvent where eventMonitored = true is only set after the table is cleared.

Merge request reports