Slow Maven artifact download
GitLab CI jobs have to spend a lot of time downloading artifacts. This is because Maven downloads artifacts in sequence, with 2 requests per artifact (jar file and sha1), and it takes Nexus more than 500ms on average to start a transfer. This adds up to minutes when downloading hundreds of artifacts
See for example https://gitlab.cern.ch/db/java-application-servers-cern-sso-integrations/sign-out-certificate-client/-/jobs/2844318
- There's not much CPU throttling so CPU limits probably do not play a significant role
- Looking at a network trace we see the following sequence:
# nexus receives request
296 216.296709 10.76.23.1 10.76.7.70 HTTP 68 GET /repository/IT-DB-DAR-Proxy/org/codehaus/mojo/jaxws-maven-plugin/2.5/jaxws-maven-plugin-2.5.pom HTTP/1.1
# nexus starts connecting to S3
298 216.430684 10.76.7.70 188.184.84.251 DNS 127 Standard query 0x907d A nexus-cern-test-repomgr2.s3.cern.ch.test-repomgr2.svc.cluster.local
# nexus sends 2 small HTTP requests with S3 (HEAD?)
324 216.461626 10.76.7.70 137.138.121.187 TLSv1.2 871 Application Data
[...]
# nexus requests file data
330 216.756644 10.76.7.70 137.138.121.187 TLSv1.2 884 Application Data
[...]
# end of file data transfer from S3
343 216.976525 10.76.7.70 137.138.121.187 TCP 66 42616 → 443 [ACK] Seq=2821 Ack=24559 Win=80768 Len=0 TSval=1286071438 TSecr=509778202
# Nexus starts writing response
344 216.977673 10.76.7.70 10.76.23.1 TCP 6756 8081 → 53530 [ACK] Seq=343 Ack=1356 Win=29568 Len=6690 TSval=1286071439 TSecr=2303015695 [TCP segment of a reassembled PDU]
[...]
# Nexus done writing response
348 216.978563 10.76.7.70 10.76.23.1 HTTP/XML 5116 HTTP/1.1 200 OK
Nexus spends about:
- 150ms before it starts requesting data from S3 (it seems to be a lot less for subsequent requests)
- 300ms in 2 requests for file metadata from S3 (presumably - we'd need to connect to S3 in plain HTTP to see exact requests)
- 200ms obtaining file content from S3
In the end almost all the delay comes from talking to the S3 storage backend. Can the S3 response times be improved?
Possible workarounds:
- can Maven run parallel downloads?
- cache Maven artifacts in gitlab ci: https://stackoverflow.com/questions/37785154/how-to-enable-maven-artifact-caching-for-gitlab-ci-runner