Add monitoring for TCP/IP logic, Follow-up from "Support for PyHAL-based write/read, adding function execution requests and support for VCU128 tcp/ip-based firmware"
We need to add monitoring for registers that hold the internal state of the TCP/IP logic, and if the state changes from expected state it has to find a reason and report it.
TODO(Petr): Add more information here with registers, etc.
The best source is probably the FEROL/FEROL40 controller.
One example:
After an attempt to open a connection we need to check for the outcome and if there is a failure, then we need to obtain the reason why it failed. Example is in function catchTcpError
:
https://gitlab.cern.ch/cmsos/worksuite/-/blob/baseline_sulfur_16/ferol40/src/common/Ferol40.cc#L904
It reads from eth_10gb_Status_TCP_flags
(on DTH it is called TCP_100Gb_Status_flags_SND_probe
) and returns the reason why the connection failed.
The function is also called in the monitoring loop to check and diagnose any failure during the TCP data transmission.
--
The following discussion from !37 (merged) should be addressed:
-
@dinyar started a discussion: (+11 comments) Same question.