Tracking CI test job failures
The CI test jobs are failing very frequently, for an assortment of reasons. In a period of about two weeks, 25 test:s1.2-vu9p-so2 jobs were launched, and 15 test:s1.2-vu13p-so2 jobs. There were 20 failures on the vu9p, and 9 failures on the vu13p.
Failures most commonly occurred on the vu9p, either during the Setup transition (programming the FPGA) or the ConfigureTx transition (repeated attempts needed to configure a Firefly; interestingly, FF0.9 was always the culprit). No clear pattern emerges for the vu13p.
The table below shows the most common error types for each device, as well as the number of failed / passed test iterations at the time the job failed.
Error | Transition | N failed / passed (vu9p) | N failed / passed (vu13p) | Example job |
---|---|---|---|---|
» Programming daughter card X0 » An exception of type 'std::runtime_error' was thrown in Command::code(): Start-up failed. Run 'DC0:FPGA Measure "Status Register"' for more information. |
Setup |
1/3, 1/8, 1/4, 1/16, 1/6, 1/0, 1/1, 1/3, 1/11 |
https://gitlab.cern.ch/p2-xware/software/serenity-herd/-/jobs/33290140 |
|
An exception of type 'uhal::exception::PCIeCommunicationError' was thrown in Command::code(): Read of 16 bytes at address 0 failed! errno=5, meaning "Input/output error" | ConfigureTx | 1/17 ,1/12 |
https://gitlab.cern.ch/p2-xware/software/serenity-herd/-/jobs/32893847 |
|
Exception thrown when configuring FF0.9 (std::runtime_error, "I2C no acknowledge received after 0.170084ms (1 attempts)"). Retrying (attempt 2). |
ConfigureTx | 1/11, 1/5, 1/5, 1/19, 1/10, 1/12, 1/9 |
https://gitlab.cern.ch/p2-xware/software/serenity-herd/-/jobs/33291147 |
|
Exception thrown when configuring FF0.6 (std::runtime_error, "I2C no acknowledge received after 0.170084ms (1 attempts)"). Retrying (attempt 2). | ConfigureRx | 1/8 |
https://gitlab.cern.ch/p2-xware/software/serenity-herd/-/jobs/33088181 |
|
Monitoring data missing in 1 objects(s): board.x0 | ConfigureRx | 1/12, 1/1 |
https://gitlab.cern.ch/p2-xware/software/serenity-herd/-/jobs/33264322 |
|
Timeout failure | 1/5 |
https://gitlab.cern.ch/p2-xware/software/serenity-herd/-/jobs/33264321 |
||
Could not connect to http://herd:3000 |
1/0, 1/0, 1/0 | 1/0, 1/0, 1/0, 1/0 |
https://gitlab.cern.ch/p2-xware/software/serenity-herd/-/jobs/33096752 |