detecting dead servers
Created by: andreh12
As discussed today, we could look at timestamps of flashlists to check if PCs in the run are still alive or not.
For the recent case of the FRLpc, here is an example: /daqexpertflashlists/flashlists/pro/cdaq/JOB_CONTROL/2018/9/24/19/1537817871328.json.gz
, the timestamp corresponds to Mon Sep 24 21:37:51 CEST 2018
.
This file has the following data:
"context" : "http://frlpc40-s2d19-41-01.cms:9999",
...
"timestamp" : "2018-09-24T18:58:37.941695Z",
which is significantly older than the file timestamp while for another FRLpc there is:
"context" : "http://frlpc-s1d06-07-01.cms:9999",
...
"timestamp" : "2018-09-24T19:37:46.369696Z",
which is within a few seconds of the file timestamp.
The HOST_INFO
flashlist seems to be essentially empty for this time.
Alternatively, the DISK_INFO
also has timestamps and shows a similar lack of update for this FRLpc (see /daqexpertflashlists/flashlists/pro/cdaq/DISK_INFO/2018/9/24/19/1537817871328.json.gz
).