Skip to content
Snippets Groups Projects

Fix thread-safety problem in GaudiHandle

Merged Scott Snyder requested to merge ssnyder/Gaudi:mtHandle-20200611 into master

Fix thread-safety problem in GaudiHandle that was observed to cause a rare crash in Athena.

The component retrieval code in GaudiHandle can lead to operator->() returning a null pointer for a valid handle, if it is called simultaneously from multiple threads.

The relevant code in GaudiHandle is:

  T* operator->() {
    assertObject();
    return m_pObject;
  }


  void assertObject() const {
    if ( UNLIKELY( !isValid() ) ) {
       throw GaudiException ...
    }
  }


  bool isValid() const {
    // not really const, because it may update m_pObject
    return m_pObject || retrieve().isSuccess();
  }


  StatusCode retrieve() const {
    // not really const, because it updates m_pObject
    StatusCode sc = StatusCode::SUCCESS;
    if ( m_pObject && release().isFailure() ) { sc = StatusCode::FAILURE; }
    if ( sc && retrieve( m_pObject ).isFailure() ) {
      m_pObject = nullptr;
      sc        = StatusCode::FAILURE;
    }
    return sc;
  }

Note that if the pointer is already set when retrieve() is called, it clears it and redoes the lookup.

So we could have:

  thread 1 calls operator->.  m_pObject is not set, so it executes
  through to retrieve() after the test on m_pObject.

  thread 2 calls operator->.  m_pObject is not set, so it executes
  through to the start of retrieve().

  thread 1 continues, does the lookup, sets m_pObject and returns.

  thread 2 continues, tests m_pObject, finds it set, and calls reset()
  to clear it.

  thread 1 continues, returning through operator->.  m_pObject is now
  null, so it returns null.

  thread 2 continues with the retrieve, and returns a valid pointer
  from operator->.

I've tentatively tried to fix this by making m_pObject atomic, and changing retrieve() so that it will not change m_pObject if it is already set. I think that's consistent with what's implied by retrieve() being const. ATLAS software seems fine with this change, but this is, however, a potential change of behavior, so i point it out here.

Edited by Scott Snyder

Merge request reports

Pipeline #1706515 passed

Pipeline passed for d3684bce on ssnyder:mtHandle-20200611

Approved by

Merged by Marco ClemencicMarco Clemencic 4 years ago (Jun 18, 2020 8:07am UTC)

Merge details

  • Changes merged into master with f4d7918c (commits were squashed).
  • Deleted the source branch.

Pipeline #1726313 passed

Pipeline passed for f4d7918c on master

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
Please register or sign in to reply
Loading