Skip to content

Add HLT timeout handling and improve handling of other errors

Rafal Bielski requested to merge rbielski/athena:hlt-timeout-handling into master

Implemented handling of soft timeout in the online HLT framework (ATR-16897), improved a few aspects of general error handling (ATR-19248), and fixed a few small things.

Changes:

  • Avoid using ATH_CHECK for calls which can return Athena::Status::TIMEOUT, because it prints out a FATAL message in this case.
  • Return Athena::Status::TIMEOUT on timeout in the MTCalibPeb test tool/alg.
  • Define hltonl::PSCErrorCode::TIMEOUT in TrigKernel
  • Fix a few uninitialised members in HltEventLoopMgr (mainly related to timeout-handling).
  • HltEventLoopMgr: on non-success EventStatus, check if timeout happened and, if yes, send event to a dedicated timeout debug stream.
  • HltEventLoopMgr: use EventID from context instead of the old EventInfo in error print-outs.
  • HLTResultMTByteStreamCnv: make sure failed events go only to the debug streams (remove stream tags of other type), but still save all the HLT results to this stream, if they're available.
  • TrigOutputHandling tools: check validity of HLTSummary handle before using it.
  • Add a test of the timeout handling in TrigP1Test (job options and ART test shell script).

Questions already resolved in the discussion below:

  1. Shall we adapt ATH_CHECK to not print FATAL on timeout status?
    Answer: The error policy will be reviewed with software coordinators later, for now this MR can be accepted as it is.
  2. I realised it doesn't make much sense to try constructing an HLT result if DecisionSummaryMaker didn't run, and this currently is the case for any timeout/failure. If we want to save as much information as possible to the debug stream, maybe we should run it explicitly in the online framework, and not as an algorithm?
    Answer: The partial summary information obtained by running summary maker on aborted events would not be useful for debugging offline, so there is no point running this.

Tagging @fwinkl, @tamartin, @tbold, @wiedenma

Edited by Rafal Bielski

Merge request reports