Skip to content
Snippets Groups Projects

Add HLT timeout handling and improve handling of other errors

Merged Rafal Bielski requested to merge rbielski/athena:hlt-timeout-handling into master
All threads resolved!

Implemented handling of soft timeout in the online HLT framework (ATR-16897), improved a few aspects of general error handling (ATR-19248), and fixed a few small things.

Changes:

  • Avoid using ATH_CHECK for calls which can return Athena::Status::TIMEOUT, because it prints out a FATAL message in this case.
  • Return Athena::Status::TIMEOUT on timeout in the MTCalibPeb test tool/alg.
  • Define hltonl::PSCErrorCode::TIMEOUT in TrigKernel
  • Fix a few uninitialised members in HltEventLoopMgr (mainly related to timeout-handling).
  • HltEventLoopMgr: on non-success EventStatus, check if timeout happened and, if yes, send event to a dedicated timeout debug stream.
  • HltEventLoopMgr: use EventID from context instead of the old EventInfo in error print-outs.
  • HLTResultMTByteStreamCnv: make sure failed events go only to the debug streams (remove stream tags of other type), but still save all the HLT results to this stream, if they're available.
  • TrigOutputHandling tools: check validity of HLTSummary handle before using it.
  • Add a test of the timeout handling in TrigP1Test (job options and ART test shell script).

Questions already resolved in the discussion below:

  1. Shall we adapt ATH_CHECK to not print FATAL on timeout status?
    Answer: The error policy will be reviewed with software coordinators later, for now this MR can be accepted as it is.
  2. I realised it doesn't make much sense to try constructing an HLT result if DecisionSummaryMaker didn't run, and this currently is the case for any timeout/failure. If we want to save as much information as possible to the debug stream, maybe we should run it explicitly in the online framework, and not as an algorithm?
    Answer: The partial summary information obtained by running summary maker on aborted events would not be useful for debugging offline, so there is no point running this.

Tagging @fwinkl, @tamartin, @tbold, @wiedenma

Edited by Rafal Bielski

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Changes look fine to me, @rbielski please resolve the open discussion to get this MR approved.

  • Rafal Bielski added 1 commit

    added 1 commit

    • 6b82d8e0 - Fix aux collection name to serialise

    Compare with previous version

  • This merge request affects 7 packages:

    • HLT/Trigger/TrigControl/TrigExamples/TrigExPartialEB
    • HLT/Trigger/TrigControl/TrigKernel
    • HLT/Trigger/TrigControl/TrigServices
    • Trigger/TrigEvent/TrigSteeringEvent
    • Trigger/TrigSteer/TrigHLTResultByteStream
    • Trigger/TrigSteer/TrigOutputHandling
    • Trigger/TrigValidation/TrigP1Test

    Adding @fwinkl ,@wiedenma ,@rbielski as watchers

  • Rafal Bielski resolved all discussions

    resolved all discussions

  • Back to @grancagn for final approval

  • Waiting for the Robot to complete his task before approving, or labels will be reverted back into review.

  • Thanks Rafal - must have missed that other instance

  • :white_check_mark: CI Result SUCCESS

    Athena AthSimulation
    externals :white_check_mark: :white_check_mark:
    cmake :white_check_mark: :white_check_mark:
    make :white_check_mark: :white_check_mark:
    required tests :white_check_mark: :white_check_mark:
    optional tests :white_check_mark: :white_check_mark:

    Full details available at NICOS MR-21931-2019-03-21-18-53
    :warning: Athena: number of compilation errors 0, warnings 1
    :white_check_mark: AthSimulation: number of compilation errors 0, warnings 0
    :pencil: For experts only: Jenkins output [CI-MERGE-REQUEST 35788]

  • Rafal Bielski mentioned in merge request !22094 (merged)

    mentioned in merge request !22094 (merged)

  • Warning seems to be related to !21787 (comment 2458512) and not from this MR.

    Approving as agreed.

    Edited by Sergio Grancagnolo
  • mentioned in commit 1e961818

  • Please register or sign in to reply
    Loading