Add timeout and retry to input data local download module
See discussion here LHCb!5176 (comment 9748939)
Adds a 'reasonable' timeout to the spawned xrdcp process (determined based on file size and assuming 10MB/s transfer, and no less than 2 mins) and will allow up to 3 (re)tries to download a given URL.
Previously no timeout was applied, so if the xrdcp process was proceeding incredibly slowly (which seems to happen randomly time to time) the test would just sit there waiting until fully timing out.
Timeout is set to 5 mins which I think is reasonable. Running just for testing with a reduce value (60secs) I saw the expected behaviour
pcol ~/LHCb/stack/2025/output/MooreOnline > $MOORESCRIPTSROOT/scripts/testbench.py $MOORESCRIPTSROOT/tests/options/CalibMon/Arch.xml --working-dir=calibmon --partition=TESTCALIB --test-file-db-key=2024_data_for_monitoring
14:24:22.206 testbench.py:210 INFO Downloading input data to calibmon/input_data
14:24:22.207 context.py:130 INFO Checking URL mdf:root://eoslhcb.cern.ch//eos/lhcb/wg/rta/WP5/data_24_for_monitoring_25/0000309884/Run_0000309884_20241103-123935-438_VCEB04_0163.mdf
14:24:22.618 context.py:69 INFO -> Full download for root://eoslhcb.cern.ch//eos/lhcb/wg/rta/WP5/data_24_for_monitoring_25/0000309884/Run_0000309884_20241103-123935-438_VCEB04_0163.mdf
14:25:22.926 context.py:81 WARNING -> timeout (60 secs) expired.
14:25:22.927 context.py:75 WARNING -> Will retry download...
14:26:23.651 context.py:81 WARNING -> timeout (60 secs) expired.
14:26:23.651 context.py:75 WARNING -> Will retry download...
14:27:02.981 context.py:130 INFO Checking URL mdf:root://eoslhcb.cern.ch//eos/lhcb/wg/rta/WP5/data_24_for_monitoring_25/0000309884/Run_0000309884_20241103-123935-573_PLEB01_0163.mdf
14:27:03.328 context.py:69 INFO -> Full download for root://eoslhcb.cern.ch//eos/lhcb/wg/rta/WP5/data_24_for_monitoring_25/0000309884/Run_0000309884_20241103-123935-573_PLEB01_0163.mdf
14:27:51.356 context.py:130 INFO Checking URL mdf:root://eoslhcb.cern.ch//eos/lhcb/wg/rta/WP5/data_24_for_monitoring_25/0000309884/Run_0000309884_20241103-123935-644_SCEB10_0163.mdf
14:27:51.825 context.py:69 INFO -> Full download for root://eoslhcb.cern.ch//eos/lhcb/wg/rta/WP5/data_24_for_monitoring_25/0000309884/Run_0000309884_20241103-123935-644_SCEB10_0163.mdf
14:28:34.167 context.py:130 INFO Checking URL mdf:root://eoslhcb.cern.ch//eos/lhcb/wg/rta/WP5/data_24_for_monitoring_25/0000309884/Run_0000309884_20241103-123935-667_SCEB15_0163.mdf
14:28:34.519 context.py:69 INFO -> Full download for root://eoslhcb.cern.ch//eos/lhcb/wg/rta/WP5/data_24_for_monitoring_25/0000309884/Run_0000309884_20241103-123935-667_SCEB15_0163.mdf
14:29:34.556 context.py:81 WARNING -> timeout (60 secs) expired.
14:29:34.556 context.py:75 WARNING -> Will retry download...
14:30:35.591 context.py:81 WARNING -> timeout (60 secs) expired.
14:30:35.592 context.py:75 WARNING -> Will retry download...
<snip>
FYI @msaur My hope here is this will help clear up the sporadic timeouts seen in the MooreOnline tests.
Edited by Christopher Rob Jones