How to treat empty CF nodes?
As described in #135 (closed), bugs can arise from the presence of empty CF nodes in a job configuration. While the empty node may make sense as a placeholder in a flexible system, it is still evaluated.
The specific concern in #135 (closed) was the default False decision from a ModeOR=True node. However, regardless of the outcome an empty CF node produces a decision without needing to schedule an algorithm, and this can lead to odd behaviour. Another (very similar) example is shown in the attached JO - the empty CF node leads to the RootDecisionNode in an event being fully evaluated without scheduling a single algorithm. That leads to the scheduler hanging completely, since the event is resolved but the AlgExecStateSvc has never been updated.
Some possible resolutions, none of which I like that much:
-
Change the default return from an OR sequence to be True. Will fix a lot of the quirks, but it is just addressing the symptom.
-
Detach empty CF nodes at configuration time. Will fix the problem, but leads to jobs not looking exactly like how they are configured. Also only addressing the symptom.
-
Add better handling of events that are resolved in the scheduler but not AlgExecStateSvc. This problem also arose in !979 (merged), and comes from this part of the scheduler: https://gitlab.cern.ch/gaudi/Gaudi/-/blob/master/GaudiHive/src/AvalancheSchedulerSvc.cpp#L663
Note the comment that any failures should already have been handled. If in fact this is not the case, the scheduler is stuck forever waiting for new actions in the queue. A default behaviour (marking the event as failed?) would help.
Suggestions welcome!