Algorithm state profiling
Added a method to the scheduler to return the number of algorithms in each state.
Useful for profiling multithreaded jobs where we see inefficient use of threads: are we limited by our CF/DF graph, by algorithm cardinality, or by some other locking mechanism?
Merge request reports
Activity
changed milestone to %v33r2
added task scheduling label
added 1 commit
- f004ef64 - Move the vector of all states to a private member, and make it const
Tagging @ishapova and @leggett - this is the level of information that we think would be useful for the HLT, but would some other structure be better for a wider context?
We could potentially just dump the entire alg/state map, but that seemed excessive.
Expected usage is for our monitoring system to poll this method at a fixed sampling interval, rather than keyed to a particular event (although obviously that could be done too).
Regards, Ben
- Resolved by Benjamin Michael Wynne
In general, I support having this sort of functionality. It may indeed be useful in several situations. But I have the following concerns about this addition:
- I don't think it is thread-safe. Underlying state sets can be changed by actions running in the control thread while you're counting. This will result in both misses (false underoccupancy) and duplicates (false oversubscription) in state totals.
- Perhaps, it would be semantically more appropriate to make this part of
AlgExecStateSvc
? The latter might already have some relevant methods, though - as is - with narrower state sets. In this case, however, a client of the interface would need to have access to all active event contexts. IIRC, we don't track that on the Gaudi side (apart from the scheduler where we can access those indirectly). Were you intending to access the interface from the HLT event loop?
added 1 commit
- a4b85a4d - Count states in the updateStates method, for thread safety
- Resolved by Benjamin Michael Wynne
added 1 commit
- 6486d856 - Changed method name to occupancy. Added mutex
added 42 commits
-
6486d856...c7d6fdfd - 41 commits from branch
gaudi:master
- 64e4f669 - Merge remote-tracking branch 'upstream/master' into AlgStateProfiling
-
6486d856...c7d6fdfd - 41 commits from branch
assigned to @clemenci
added 2 commits
- Resolved by Benjamin Michael Wynne
- Resolved by Illya Shapoval
added 1 commit
- a7eb5af5 - Changed to an internal timer and queue of snapshots
- Resolved by Benjamin Michael Wynne
- Resolved by Benjamin Michael Wynne
added 2 commits
- d0000032 - Changed to a loop over state values, rather than over a vector containing all states.
- 84428f82 - Merge branch 'AlgStateProfiling' of https://gitlab.cern.ch/bwynne/Gaudi_gaudi...
added 76 commits
-
7769461d...d1dc2b0e - 73 commits from branch
gaudi:master
- de1a27df - Merge remote-tracking branch 'upstream/master' into AlgStateProfiling
- 09e98e73 - Added bounded queue, retrieve data with callback. But about to remove queue completely
- 87a58896 - Remove the queue, process snapshots directly
Toggle commit list-
7769461d...d1dc2b0e - 73 commits from branch
- Resolved by Benjamin Michael Wynne
- Resolved by Gerhard Raven
/ci-test --merge
- [2020-07-29 15:26] Validation started with lhcb-master-mr#1133
mentioned in commit 19d53385