[ATR-20148] More MT safety for TrigCostMonitor MT
I was fortunate to spot a TrigCostMonitorMT crash with --threads=4
in a unit test.
I have been unable to reproduce it, but on further inspection I identified a small possible hole in the existing logic.
m_eventMonitored[ context.slot() ] = false
at the start of endEvent
is supposed to prevent further writes to the concurrent map, allowing it to be iterated over. However this does not protect against threads calling processAlg
which have already gotten past the isMonitoredEvent
check and are in the process of executing the code which inserts into the map.
This will invalidate the iteration over the whole map in endEvent
Solution: Add a std::shared_mutex
per slot. The processAlg
call obtains a std::shared_lock
as we want to be able to have many threads writing in to this. The endEvent
call obtains a std::unique_lock
after having set eventMonitored = flase
and hence is assured that all currently-executing calls to processAlg
will have been given time to complete, and that no new ones may start.
Only then is the full table iterated over.
A similar consideration is given to startEvent
where eventMonitored = true
is only set after the table is cleared.