I think there is a bug in AlgResourcePool when it comes to deciding if an algorithm instance is "reentrant" or not.
AlgResourcePool::acquireAlgorithm only checks if the cardinality is zero to decide if an algorithm is re-entrant but ignores any potential isReEntrant override. Note that decodeTopAlgs does seem to handle it correctly.
This could explain some unexpected concurrent scheduling we are seeing in the ATLAS HLT (ATR-25365). But someone should try to reproduce/verify my hypothesis.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
At the moment, we have two ways of signalling re-entrancy: isReentrant() = true, or cardinality = 0
Cardinality is the only one that AlgResourcePool will respect, although there will be a warning message at initialisation time if an alg is marked isReentrant but does not have cardinality 0.
An algorithm with cardinality 0 will be treated as reentrant by AlgResourcePool, regardless of the isReentrant flag. I suspect this is/was intended as a debugging option.
The most obvious change as I see it would just be to use isReentrant as the marker for re-entrancy (ignoring cardinality in this case) and then to use cardinality only for its original purpose - setting the number of non-reentrant algorithm clones. This would change the behaviour when isReentrant=true and cardinality>=1.
A more backward-compatible, but more confusing, alternative is to allow both to be used, but the different possible combinations aren't easy to intuit for a user - they have to be written explicitly:
isReentrant False, cardinality >= 1: standard clonable algorithmisReentrant True, cardinality == 0: standard re-entrant algorithmisReentrant False, cardinality == 0: the problem described in ATR-25365, currently treated as re-entrant despite the intention of the alg author that it should not be. Changing this behaviour would imply isReentrant can override cardinalityisReentrant True, cardinality >= 1: currently treated as standard clonable algorithm, with warning message. Keeping this behaviour would imply that cardinality can override isReentrant.
Clearly, there are more parameters than actual degrees of freedom. So why not just defineisReentrant() as cardinality == 0, instead of having it as an independent settable entity?
Unfortunately in Atlas we have some algorithms that override the value of isReentrant directly. Asking them to override cardinality instead is less intuitive.
It's conflated with the move from the "old" GaudiKernel/Algorithm.h to the "new" Gaudi/Algorithm.h. These are non-reentrant and reentrant respectively, and we have an Athena algorithm class that inherits from each. However, using the older version to signal non-reentrancy means using code that is effectively deprecated (it's already marked as the legacy adapter). Ideally instead we use the new base class in all cases, which would make all algorithms re-entrant by default, and then have a way to signal that some are not.
I think the model that all algorithms are re-entrant by default, with specific exceptions made when necessary, is the right one. Already the non-reentrant behaviour in Gaudi is deprecated, relying on a const_cast. Re-entrancy is not the future migration target any more, it is the expected behaviour.
To mark an algorithm as non-reentrant, the isReentrant override seems the intuitive way to do it - what is this flag for otherwise? It's a simple change to the AlgResourcePool to use the flag.
The default behaviour in AlgResourcePool is to create at least one copy of every algorithm, so there is no potential issue when isReentrant=false and cardinality=0. When isReentrant=true we can set the number of algorithm clones to be 1 regardless of cardinality.