PerfMonComps: Switch from per-slot to per-thread metric collection in PerfMonMT
Originally we were collecting algorithm execution statistics on a per-slot basis. Each slot had its own map, protected by a mutex to avoid concurrent threads altering a slot's map simultaneously, where the key is basically a pair of the algorithm name and the execution step. This works fine in almost all cases except for algorithms that execute within a view (i.e. trigger). In that scenario, the same algorithm can execute multiple times per event, i.e. from concurrent threads for the same slot. This causes asynchronous auditor calls which mess up the metrics (but not necessarily a threading problem per-se due to locks).
Now we collect metrics on a per-thread basis. This way, we not only work around the afore mentioned limitation, but also can get rid of the locks as threads have their own maps now. In order to accomplish this, we make use of
tbb::this_task_arena::current_thread_index() which provides a unique index within a task arena, and use that as the index of the vector (constructed during initialization) that holds the maps. Here we rely on the fact that we use a single task arena to run all algorithms, hence
tbb::this_task_arena::current_thread_index() is unique. The new approach is tested in
RDOtoRDOTrigger in 16 threads and seems to work as expected.
There are other ways to go about this. We could keep the original approach (per-slot) but give a unique identifier to algorithms running in separate views (e.g. view id) but we would still have to lock. If at any point
tbb::this_task_arena::current_thread_index() is not unique (multiple parallel arenas), we can think about using the thread ids instead but that might not be as seamless.
In any case, let's see how this goes. Note that none of the official workflows should be affected by this change as we run detailed monitoring only in SPOT daily tests for the time being.