Skip to content

AthenaPoolCnvSvc: A few improvements to the new SharedWriter

Alaettin Serhan Mete requested to merge amete/athena:master-ATLASDPD-1683 into master

There are two reported issues w/ the new SharedWriter in the derivation production with 22.0.62. This MR attempts to fix/improve the situation in that context.

  1. ATLASDPD-1683: When no worker sends events to be written out (i.e. the underlying kernel accepts no events in any worker), the server gets stuck. Now, once we write the MetaData we check to see if there are any clients the server should be waiting for. If not (i.e. the reported issue) we explicitly terminate the loop.

  2. ATLASDPD-1682: When there is a problem with receiving the message from the socket (i.e. connection reset by peer etc.), the server can get stuck. Although the underlying reason for the network problem is not known (and hard to debug without a reproducer), now we catch the error and attempt to "gracefully" terminate the server/job.

Please don't merge without an explicit approval from @gemmeren.

cc @calpigia @walkerr

Merge request reports