Fix thread-safety problem in GaudiHandle
Fix thread-safety problem in GaudiHandle that was observed to cause a rare crash in Athena.
The component retrieval code in GaudiHandle can lead to operator->() returning a null pointer for a valid handle, if it is called simultaneously from multiple threads.
The relevant code in GaudiHandle is:
T* operator->() {
assertObject();
return m_pObject;
}
void assertObject() const {
if ( UNLIKELY( !isValid() ) ) {
throw GaudiException ...
}
}
bool isValid() const {
// not really const, because it may update m_pObject
return m_pObject || retrieve().isSuccess();
}
StatusCode retrieve() const {
// not really const, because it updates m_pObject
StatusCode sc = StatusCode::SUCCESS;
if ( m_pObject && release().isFailure() ) { sc = StatusCode::FAILURE; }
if ( sc && retrieve( m_pObject ).isFailure() ) {
m_pObject = nullptr;
sc = StatusCode::FAILURE;
}
return sc;
}
Note that if the pointer is already set when retrieve() is called, it clears it and redoes the lookup.
So we could have:
thread 1 calls operator->. m_pObject is not set, so it executes
through to retrieve() after the test on m_pObject.
thread 2 calls operator->. m_pObject is not set, so it executes
through to the start of retrieve().
thread 1 continues, does the lookup, sets m_pObject and returns.
thread 2 continues, tests m_pObject, finds it set, and calls reset()
to clear it.
thread 1 continues, returning through operator->. m_pObject is now
null, so it returns null.
thread 2 continues with the retrieve, and returns a valid pointer
from operator->.
I've tentatively tried to fix this by making m_pObject atomic, and changing retrieve() so that it will not change m_pObject if it is already set. I think that's consistent with what's implied by retrieve() being const. ATLAS software seems fine with this change, but this is, however, a potential change of behavior, so i point it out here.