Skip to content

AthenaServices: Rethrow MetaDataSvc exceptions for early job termination

This is one of the issues that came up in ATEAM-837. The errors that come from MetaDataSvc incidents are being swallowed. As a result, jobs reading problematic inputs continue processing with error messages like:

21:32:36 IncidentSvc         ERROR Standard std::exception is caught handling incident FirstInputFile in listener MetaDataSvc
21:32:36 IncidentSvc         ERROR FID "9A80A706-5BAF-D540-87E9-9925520EAA80" is not existing in the catalog ( POOL : "PersistencySvc::UserDatabase::connectForRead" from "PersistencySvc" )

and

21:32:36 Exception: FID "D9BD5B35-1F5E-0948-9C9D-8B4C58F20D3B" is not existing in the catalog ( POOL : "PersistencySvc::UserDatabase::connectForRead" from "PersistencySvc" ) (no backtrace available).
21:32:36 DataHeaderCnv       ERROR createObj - caught exception: AthenaPoolCnvSvc::::ExcCaughtException: Caught exception in StatusCode T_AthenaPoolCustomCnvWithKey<TRANS, PERS>::PoolToDataObject(DataObject*&, const Token*, const string&) [with TRANS = DataHeader; PERS = DataHeader_p6; std::string = std::__cxx11::basic_string<char>] while creating transient objectDataHeader/MetaDataHdr(DataHeader): std::runtime_error: FID "D9BD5B35-1F5E-0948-9C9D-8B4C58F20D3B" is not existing in the catalog ( POOL : "PersistencySvc::UserDatabase::connectForRead" from "PersistencySvc" )
21:32:36 DataHeaderCnv       ERROR createObj failed to get DataObject, Token = [DB=EDACCF45-C609-A64A-9290-DEF0EF06D3E2][CNT=MetaDataHdr(DataHeader)][CLID=4DDBD295-EFCE-472A-9EC8-15CD35A9EB8D][TECH=00000203][OID=00000A83000010D2-0000000000000000]
21:32:36 DataProxy         WARNING accessData: conversion failed for data object 222376821/;00;MetaDataSvc
21:32:36  Returning NULL DataObject pointer
21:32:36 VarHandle(Input...WARNING StoreGate/src/VarHandleBase.cxx:1069 (void*SG::VarHandleBase::typeless_dataPointer_fromProxy(SG::DataProxy*, bool) const): this proxy 0x3b6b8280 has a NULL data object ptr
21:32:36 MetaDataSvc         ERROR Could not get DataHeader, will not read Metadata
21:32:36 MetaDataSvc       WARNING Unable to load MetaData Proxies

The job exits with a status code 0 but eventually gets killed by the transform due to error messages.

In this MR, we update the way incidents are handled in MetaDataSvc and re-throw the exceptions. As a results, the same job terminates during initialization with:

IncidentSvc         ERROR Standard std::exception is caught handling incident FirstInputFile in listener MetaDataSvc
IncidentSvc         ERROR FID "940DE9DE-3253-DE4D-9148-B19C41C1878E" is not existing in the catalog ( POOL : "PersistencySvc::UserDatabase::connectForRead" from "PersistencySvc" )
EventSelector       FATAL in sysInitialize(): standard std::exception is caught
EventSelector       ERROR std::exception
ServiceManager      ERROR Unable to initialize service "EventSelector"
AthenaEventLoopMgr  FATAL No valid event selector called EventSelectorAthenaPool/EventSelector
ServiceManager      ERROR Unable to initialize Service: AthenaEventLoopMgr
Py:Athena            INFO leaving with code 33: "failure in initialization"
ApplicationMgr       INFO Application Manager Terminated successfully

I also tried to unify the message levels with the return codes (i.e. ERROR = FAILURE).

cc: @gemmeren @mnowak

Edited by Alaettin Serhan Mete

Merge request reports