HltControlFlowMgr retries producers that fail

Investigating some unrelated problem we stumbled on a weird behaviour of HltControlFlowMgr: when the producer for some data fails, it gets executed again for every consumer of the product.

The attached configuration (reproducer.py), to be invoked with gaudirun.py ./reproducer.py:config, consists of one producer and multiple consumers of the product. The producer operator() just throws an exception, so the execution should stop, but the algorithm i retried multiple times:

$ gaudirun.py ./reproducer.py:config
[...]
producer                              ERROR producer : failing by construction
producer                              ERROR Maximum number of errors ( 'ErrorMax':1) reached.
HLTControlFlowMgr                     FATAL Event failed in Node Gaudi__Examples__IntDataConsumer/consumer0 : Error in algorithm execute
producer                              ERROR producer : failing by construction
producer                              ERROR Maximum number of errors ( 'ErrorMax':1) reached.
HLTControlFlowMgr                     FATAL Event failed in Node Gaudi__Examples__IntDataConsumer/consumer1 : Error in algorithm execute
producer                              ERROR producer : failing by construction
producer                              ERROR Maximum number of errors ( 'ErrorMax':1) reached.
HLTControlFlowMgr                     FATAL Event failed in Node Gaudi__Examples__IntDataConsumer/consumer2 : Error in algorithm execute
producer                              ERROR producer : failing by construction
producer                              ERROR Maximum number of errors ( 'ErrorMax':1) reached.
HLTControlFlowMgr                     FATAL Event failed in Node Gaudi__Examples__IntDataConsumer/consumer3 : Error in algorithm execute
producer                              ERROR producer : failing by construction
producer                              ERROR Maximum number of errors ( 'ErrorMax':1) reached.
HLTControlFlowMgr                     FATAL Event failed in Node Gaudi__Examples__IntDataConsumer/consumer4 : Error in algorithm execute
producer                              ERROR producer : failing by construction
producer                              ERROR Maximum number of errors ( 'ErrorMax':1) reached.
HLTControlFlowMgr                     FATAL Event failed in Node FailingIntProducer/producer : Error in algorithm execute
HLTControlFlowMgr                     FATAL *** Event 0 on slot 0 failed! ***
[...]
ApplicationMgr                        ERROR Application Manager Terminated with error code 3

What we understood is that in BasicNode::execute when the algorithm fails it is not recorded in AlgoStates as executed, but failed (also because AlgoStates can only record 2 bool, one for executed and the other for filter pass).

/cc @sponce @rmatev

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information