Skip to content

Use async server-side implementation for WFE in gRPC Frontend

Test failures for JWT Authentication

When I first tested the changes for JWT Authentication in CI, the following tests failed:

  • test_client,
    • Reason: error thrown by the gRPC framework: "Server Threadpool Exhausted". This error is returned by the framework, only in the case of synchronous server RPC implementations, which can only use up to a fixed number of request serving threads (configurable). If these run out, then we will get this error from the framework.
  • test_client_gfal
    • Reason: Undetermined. These are the relevant errors seen in the test output, but from the logs it is not obvious what is going wrong:
ERROR with xrootd transfer for file 0/0004, full logs in /dev/shm/bfdf15c4-8a85-491e-98e7-99d2fdb0a74e/00004
ERROR with xrootd transfer for file 0/0000, full logs in /dev/shm/bfdf15c4-8a85-491e-98e7-99d2fdb0a74e/00000
ERROR with xrootd transfer for file 0/0009, full logs in /dev/shm/bfdf15c4-8a85-491e-98e7-99d2fdb0a74e/00009
ERROR with xrootd transfer for file 0/0012, full logs in /dev/shm/bfdf15c4-8a85-491e-98e7-99d2fdb0a74e/00012
ERROR with xrootd transfer for file 0/0002, full logs in /dev/shm/bfdf15c4-8a85-491e-98e7-99d2fdb0a74e/00002
ERROR with xrootd transfer for file 0/0005, full logs in /dev/shm/bfdf15c4-8a85-491e-98e7-99d2fdb0a74e/00005
ERROR with xrootd transfer for file 0/0007, full logs in /dev/shm/bfdf15c4-8a85-491e-98e7-99d2fdb0a74e/00007

Failures also present in the main branch

I noticed that these tests were also failing similarly in our main branch, therefore the error was not due to the changes for JWT Authentication.

So in order to resolve at least the first problem, I had the idea of using the async callback API on the server side to implement the RPCs for the physics workflow events. The async API controls the size of the threadpool and grows/shrinks it, and it is not possible to get such an error from the Async API. (looking at the gRPC source code the error about "thread pool exhausted" only comes from the sync implementation).

Errors resolved by using async API, and stress test passes

Indeed this removes the errors for both tests on the main branch:

Errors also resolved by better configuring the sync options, stress test also passes

A note on XRootD/SSI Configuration / number of threads

The XRootD/SSI framework has an async implementation for rpcs. The number of request serving threads is managed by the framework, but there is a lower and upper limit:

Defaults
     xrd.sched mint 8 maxt 2048 avlt 512 idle 780

Our configuration

Resume testing for JWT Authentication

Tested with 4096 threads and sync frontend:

TODO: Test JWT Authentication with Async Frontend Implementation

More details in the linked codiMD.

Edited by Konstantina Skovola