Create a queue for all schedule-able algorithms
The AvalancheScheduler preserves algorithm ordering by only scheduling them when there is an available thread. However, the current implementation has the side-effect of creating scheduler overhead when there are very many algorithms available to schedule, relative to the number of threads. The majority of these schedule attempts will be discarded, causing a repeated cycle of retrieving precedence graph states and then calling promoteToScheduled.
By creating a queue for all algorithms that can be scheduled, the order of execution is preserved in the queue, and all results of evaluating the precedence graph are guaranteed to be used first time. This gives a dramatic improvement to scheduler performance in scenarios where many algorithms can be executed concurrently.
This is illustrated in the attached plot:
The test is conducted using the JO in !865 (merged), and comparison is made to the Gaudi master branch, and to !863 (merged) (scouting).
NB: this MR should be applied as well as !863 (merged)
Merge request reports
Activity
@ishapova could you please take a look?
I wondered if there's a better way to organise the queue, something more like https://gitlab.cern.ch/gaudi/Gaudi/blob/master/GaudiHive/src/AvalancheSchedulerSvc.cpp#L621
- [2019-03-26 17:21] Validation started with lhcb-gaudi-merge#708
- [2019-03-27 00:06] Automatic merge failed in lhcb-lcg-dev4#851
- [2019-03-27 00:06] Automatic merge failed in lhcb-soa-track#29
- [2019-03-27 00:06] Automatic merge failed in lhcb-sanitizers#197
- [2019-03-27 00:06] Automatic merge failed in lhcb-lcg-dev3#844
- [2019-03-27 00:07] Automatic merge failed in lhcb-dd4hep-95#47
- [2019-03-27 00:08] Automatic merge failed in lhcb-gaudi-head#2201
- [2019-03-27 00:09] Automatic merge failed in lhcb-tdr-test#495
- [2019-03-28 00:03] Automatic merge failed in lhcb-lcg-dev4#852
- [2019-03-28 00:03] Automatic merge failed in lhcb-sanitizers#198
- [2019-03-28 00:05] Automatic merge failed in lhcb-lcg-dev3#845
- [2019-03-28 00:06] Automatic merge failed in lhcb-dd4hep-95#48
- [2019-03-28 00:06] Automatic merge failed in lhcb-soa-track#30
- [2019-03-28 00:09] Automatic merge failed in lhcb-tdr-test#496
- [2019-03-28 00:10] Automatic merge failed in lhcb-gaudi-head#2202
- [2019-03-28 00:53] Automatic merge failed in lhcb-lcg-dev4#852
- [2019-03-28 00:56] Automatic merge failed in lhcb-lcg-dev3#845
- [2019-03-28 01:08] Automatic merge failed in lhcb-tdr-test#496
- [2019-03-28 01:14] Automatic merge failed in lhcb-gaudi-head#2202
- [2019-03-28 01:48] Automatic merge failed in lhcb-lcg-dev4#852
- [2019-03-28 01:52] Automatic merge failed in lhcb-lcg-dev3#845
- [2019-03-28 02:16] Automatic merge failed in lhcb-gaudi-head#2202
- [2019-03-28 02:18] Automatic merge failed in lhcb-tdr-test#496
- [2019-03-28 07:16] Automatic merge failed in lhcb-tdr-test#496
- [2019-03-28 07:17] Automatic merge failed in lhcb-gaudi-head#2202
- [2019-03-28 07:43] Automatic merge failed in lhcb-tdr-test#496
- [2019-03-28 07:47] Automatic merge failed in lhcb-gaudi-head#2202
- [2019-03-29 00:04] Validation started with lhcb-dd4hep-95#49
- [2019-03-29 00:07] Validation started with lhcb-lcg-dev4#853
- [2019-03-29 00:08] Validation started with lhcb-soa-track#31
- [2019-03-29 00:09] Validation started with lhcb-tdr-test#497
- [2019-03-29 00:11] Validation started with lhcb-sanitizers#199
- [2019-03-29 00:13] Validation started with lhcb-lcg-dev3#846
- [2019-03-29 00:13] Validation started with lhcb-gaudi-head#2203
- [2019-03-29 14:40] Validation started with lhcb-dd4hep-95#50
- [2019-03-30 00:04] Validation started with lhcb-soa-track#32
- [2019-03-30 00:04] Validation started with lhcb-dd4hep-95#51
- [2019-03-30 00:04] Validation started with lhcb-sanitizers#200
- [2019-03-30 00:04] Validation started with lhcb-lcg-dev4#854
- [2019-03-30 00:05] Validation started with lhcb-lcg-dev3#847
- [2019-03-30 00:06] Validation started with lhcb-gaudi-head#2204
- [2019-03-30 00:08] Validation started with lhcb-tdr-test#498
- [2019-03-31 00:03] Validation started with lhcb-dd4hep-95#52
- [2019-03-31 00:03] Validation started with lhcb-sanitizers#201
- [2019-03-31 00:03] Validation started with lhcb-lcg-dev3#848
- [2019-03-31 00:04] Validation started with lhcb-gaudi-head#2205
- [2019-03-31 00:04] Validation started with lhcb-soa-track#33
- [2019-03-31 00:06] Validation started with lhcb-lcg-dev4#855
- [2019-03-31 00:07] Validation started with lhcb-tdr-test#499
- [2019-04-01 00:03] Validation started with lhcb-dd4hep-95#53
- [2019-04-01 00:05] Validation started with lhcb-lcg-dev4#856
- [2019-04-01 00:05] Validation started with lhcb-lcg-dev3#849
- [2019-04-01 00:07] Validation started with lhcb-sanitizers#202
- [2019-04-01 00:07] Validation started with lhcb-soa-track#34
- [2019-04-01 00:08] Validation started with lhcb-tdr-test#500
- [2019-04-01 00:10] Validation started with lhcb-gaudi-head#2206
- [2019-04-02 00:03] Validation started with lhcb-dd4hep-95#54
- [2019-04-02 00:03] Validation started with lhcb-lcg-dev4#857
- [2019-04-02 00:03] Validation started with lhcb-lcg-dev3#850
- [2019-04-02 00:05] Validation started with lhcb-tdr-test#501
- [2019-04-02 00:06] Validation started with lhcb-gaudi-head#2207
- [2019-04-02 00:06] Validation started with lhcb-sanitizers#203
- [2019-04-02 00:07] Validation started with lhcb-soa-track#35
Edited by Software for LHCbNote that this would have the effect of resolving #45 (closed) as well
This goes along the lines of GAUDI-1142.
I'm on leave this week, will have a look at the changes ASA I'm back. Meanwhile, I like the attached profiling!
@ishapova - yes indeed. I think you mentioned the queueing idea a while ago, but seeing the profiling results for !863 (merged) a couple of weeks ago reminded me about it.
mentioned in merge request !870 (merged)
added C++ framework task scheduling labels
mentioned in issue #55 (closed)
added 30 commits
-
3dd53b4c...433b6164 - 29 commits from branch
gaudi:master
- 6a6fe8bd - Merge remote-tracking branch 'upstream/master' into QueueAlgorithms
-
3dd53b4c...433b6164 - 29 commits from branch
losing the already-calculated result of evaluating the precedence graph
A revision of the precedence rules is actually not lost: an algorithm is promoted to
DATAREADY
only once for a given event. It's the repetitive attempts to promote an algorithm toSCHEDULED
that will cause the overhead to inflate in the special case you described. The latter though happens out of PRG-based decision making.Another general comment would be that if there is an all-the-way excess of schedulable algorithms then it means there is something wrong with concurrency settings, e.g., a user set too many events in flight. However, minimizing the overhead in this corner may still be useful in occasional task emission bursts, e.g. in task avalanche generation modes. These bursts though a very narrow in all ATLAS scenarios I saw by far. In this light, it would be interesting to see a similar profling on realistic scenarios (ATLAS MC reco, or the one you have for HLT).
OK, looking at the changes now.
Now that I looked at the measurements the second time, I'm bit confused. You included the 'scouting' case, but does the !865 (merged) test include the multi-parent algorithms? If so, does the 'queue' case have the scouting mode enabled?
- Resolved by Illya Shapoval
- Resolved by Benjamin Michael Wynne
- Resolved by Illya Shapoval