Skip to content
Snippets Groups Projects

Fix thread-safety problem in GaudiHandle

Merged Scott Snyder requested to merge ssnyder/Gaudi:mtHandle-20200611 into master
  1. Jun 11, 2020
    • scott snyder's avatar
      formatting · d3684bce
      scott snyder authored
      d3684bce
    • scott snyder's avatar
      forgot to commit one change · 209dd7d6
      scott snyder authored
      209dd7d6
    • scott snyder's avatar
      Fix thread-safety problem in GaudiHandle that was observed to casue · d792aaaa
      scott snyder authored
      a rare crash in Athena.
      
      The component retrieval code in GaudiHandle can lead to operator->()
      returning a null pointer for a valid handle, if it is called
      simultaneously from multiple threads.
      
      The relevant code in GaudiHandle is:
      
      ```
        T* operator->() {
          assertObject();
          return m_pObject;
        }
      
      
        void assertObject() const {
          if ( UNLIKELY( !isValid() ) ) {
             throw GaudiException ...
          }
        }
      
      
        bool isValid() const {
          // not really const, because it may update m_pObject
          return m_pObject || retrieve().isSuccess();
        }
      
      
        StatusCode retrieve() const {
          // not really const, because it updates m_pObject
          StatusCode sc = StatusCode::SUCCESS;
          if ( m_pObject && release().isFailure() ) { sc = StatusCode::FAILURE; }
          if ( sc && retrieve( m_pObject ).isFailure() ) {
            m_pObject = nullptr;
            sc        = StatusCode::FAILURE;
          }
          return sc;
        }
      ```
      
      
      Note that if the pointer is already set when retrieve() is called, it
      clears it and redoes the lookup.
      
      So we could have:
      
      ```
        thread 1 calls operator->.  m_pObject is not set, so it executes
        through to retrieve() after the test on m_pObject.
      
        thread 2 calls operator->.  m_pObject is not set, so it executes
        through to the start of retrieve().
      
        thread 1 continues, does the lookup, sets m_pObject and returns.
      
        thread 2 continues, tests m_pObject, finds it set, and calls reset()
        to clear it.
      
        thread 1 continues, returning through operator->.  m_pObject is now
        null, so it returns null.
      
        thread 2 continues with the retrieve, and returns a valid pointer
        from operator->.
        ```
      
      I've tentatively tried to fix this by making m_pObject atomic, and
      changing retrieve() so that it will not change m_pObject if it is
      already set.  I think that's consistent with what's implied
      by retrieve() being const.  ATLAS software seems fine with this change,
      but this is, however, a potential change of behavior, so i point
      it out here.
      d792aaaa
Loading