Commit 4f5d771d authored by Scott Snyder's avatar Scott Snyder Committed by Marco Clemencic

Fix thread-safety problem in GaudiHandle that was observed to cause a rare crash in Athena.

The component retrieval code in GaudiHandle can lead to operator->()
returning a null pointer for a valid handle, if it is called
simultaneously from multiple threads.

The relevant code in GaudiHandle is:

```
  T* operator->() {
    assertObject();
    return m_pObject;
  }

  void assertObject() const {
    if ( UNLIKELY( !isValid() ) ) {
       throw GaudiException ...
    }
  }

  bool isValid() const {
    // not really const, because it may update m_pObject
    return m_pObject || retrieve().isSuccess();
  }

  StatusCode retrieve() const {
    // not really const, because it updates m_pObject
    StatusCode sc = StatusCode::SUCCESS;
    if ( m_pObject && release().isFailure() ) { sc = StatusCode::FAILURE; }
    if ( sc && retrieve( m_pObject ).isFailure() ) {
      m_pObject = nullptr;
      sc        = StatusCode::FAILURE;
    }
    return sc;
  }
```

Note that if the pointer is already set when retrieve() is called, it
clears it and redoes the lookup.

So we could have:

```
  thread 1 calls operator->.  m_pObject is not set, so it executes
  through to retrieve() after the test on m_pObject.

  thread 2 calls operator->.  m_pObject is not set, so it executes
  through to the start of retrieve().

  thread 1 continues, does the lookup, sets m_pObject and returns.

  thread 2 continues, tests m_pObject, finds it set, and calls reset()
  to clear it.

  thread 1 continues, returning through operator->.  m_pObject is now
  null, so it returns null.

  thread 2 continues with the retrieve, and returns a valid pointer
  from operator->.
  ```

I've tentatively tried to fix this by making m_pObject atomic, and
changing retrieve() so that it will not change m_pObject if it is
already set.  I think that's consistent with what's implied
by retrieve() being const.  ATLAS software seems fine with this change,
but this is, however, a potential change of behavior, so i point
it out here.
parent fdd53511
/***********************************************************************************\
* (c) Copyright 1998-2019 CERN for the benefit of the LHCb and ATLAS collaborations *
* (c) Copyright 1998-2020 CERN for the benefit of the LHCb and ATLAS collaborations *
* *
* This software is distributed under the terms of the Apache version 2 licence, *
* copied verbatim in the file "LICENSE". *
......@@ -185,13 +185,13 @@ public:
std::enable_if_t<std::is_const_v<CT> && !std::is_same_v<CT, NCT>>* = nullptr )
: GaudiHandleBase( other ) {
m_pObject = other.get();
if ( m_pObject ) ::details::nonConst( m_pObject )->addRef();
if ( m_pObject ) ::details::nonConst( m_pObject.load() )->addRef();
}
/** Copy constructor needed for correct ref-counting */
GaudiHandle( const GaudiHandle& other ) : GaudiHandleBase( other ) {
m_pObject = other.m_pObject;
if ( m_pObject ) ::details::nonConst( m_pObject )->addRef();
m_pObject = other.m_pObject.load();
if ( m_pObject ) ::details::nonConst( m_pObject.load() )->addRef();
}
/** Assignment operator for correct ref-counting */
......@@ -203,7 +203,7 @@ public:
release().ignore();
m_pObject = other.get();
// update ref-counting
if ( m_pObject ) ::details::nonConst( m_pObject )->addRef();
if ( m_pObject ) ::details::nonConst( m_pObject.load() )->addRef();
return *this;
}
......@@ -212,22 +212,24 @@ public:
GaudiHandleBase::operator=( other );
// release any current tool
release().ignore();
m_pObject = other.m_pObject;
m_pObject = other.m_pObject.load();
// update ref-counting
if ( m_pObject ) ::details::nonConst( m_pObject )->addRef();
if ( m_pObject ) ::details::nonConst( m_pObject.load() )->addRef();
return *this;
}
/** Retrieve the component. Release existing component if needed. */
StatusCode retrieve() const {
// not really const, because it updates m_pObject
StatusCode sc = StatusCode::SUCCESS;
if ( m_pObject && release().isFailure() ) { sc = StatusCode::FAILURE; }
if ( sc && retrieve( m_pObject ).isFailure() ) {
m_pObject = nullptr;
sc = StatusCode::FAILURE;
}
return sc;
// Do the lookup into a temporary pointer.
T* p = nullptr;
if ( retrieve( p ).isFailure() ) { return StatusCode::FAILURE; }
// If m_pObject is null, then copy p to m_pObject.
// Otherwise, release p.
T* old = nullptr;
if ( m_pObject.compare_exchange_strong( old, p ) ) { return StatusCode::SUCCESS; }
return release( p );
}
/** Release the component. */
......@@ -328,7 +330,7 @@ private:
//
// Data members
//
mutable T* m_pObject = nullptr;
mutable std::atomic<T*> m_pObject = nullptr;
};
/**
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment