Resolve "Update RetrieveJob to support completion report"
Merge request reports
Activity
added Scheduler workflowIn Progress labels
added 2 commits
- Automatically resolved by Jacek Maksymilian Chodak
- Automatically resolved by Jacek Maksymilian Chodak
- Automatically resolved by Jacek Maksymilian Chodak
requested review from @poliverc
I have been taking a look at the MR. Regarding the changes I did to the naming of the queues everything should be ok and not affect this (or anything else).
Also the reason why the system tests are failing is because, now, after the successful retrieves instead of deleting the job the code queues another 10k jobs to be reported for success and the tests themselves are not designed to wait for this, so when the following test (
client_multiple_retrieve.sh
) tries to archive 4 files the drive is busy working on this success reports, the request is queued but not handled in time so the test times out.[root@ctacli /]# cta-admin dr ls library drive host desired request status since vid tapepool vo files data MB/s session priority activity age reason VDSTK11 VDSTK11 tpsrv01 Up Retrieve Transfer 599 V01007 ctasystest vo 10000 153.6M 0.3 9 0 - 13 - VDSTK12 VDSTK12 tpsrv02 Down - Down 1696 - - - - - - - 0 - 5 [cta-taped] INFO Startup [root@ctacli /]# cta-admin sq type tapepool vo library vid files queued data queued oldest youngest priority min age read max drives write max drives cur. mounts cur. files cur. data tapes capacity files on tapes data on tapes full tapes writable tapes ArchiveForUser ctasystest vo - - 4 1.5K 52 52 1 1 1 1 0 0 0 3.5T 10002 153.6M 0 7 Retrieve ctasystest vo VDSTK11 V01007 10000 153.6M 316 214 1 1 1 1 1 10000 153.6M 500.0G 10002 153.6M
This behaviour should be conditional and only run with dCache, as this introduces extra work that EOS+CTA doesn't need.
Manual testing at DESY with dcache-cta-0.11 driver. The reporting observed as expected:
cta-taped:
Nov 20 11:55:30 dcache-enstore01 cta-taped: LVL="INFO" PID="29657" TID="29657" MSG="In Scheduler::reportRetrieveJobsBatch(): report URL." SubprocessName="maintenanceHandler" fileId="3778984" reportType="CompletionReport" reportURL="eosQuery://dcache-lab007.desy.de:42917/success/00005857DB6B6F504F0B80F11EBB581C2597?archiveid=3778984" Nov 20 11:55:30 dcache-enstore01 cta-taped: LVL="INFO" PID="29657" TID="29657" MSG="In Scheduler::reportRetrieveJobsBatch(): report successful." SubprocessName="maintenanceHandler" fileId="3778984" reportType="CompletionReport"
dcache:
20 Nov 2023 12:02:55 [cta-datamover-worker-2] [] Request /var/lib/dcache/pools/data/00005857DB6B6F504F0B80F11EBB581C2597 scheduling time: 101 s 20 Nov 2023 12:02:55 [cta-datamover-worker-2] [] Opening /var/lib/dcache/pools/data/00005857DB6B6F504F0B80F11EBB581C2597 for writing from 131.169.98.55:41518 20 Nov 2023 12:03:16 [cta-datamover-worker-2] [] Closing file /var/lib/dcache/pools/data/00005857DB6B6F504F0B80F11EBB581C2597 from 131.169.98.55:41518. Transferred 2.00 GiB in 21.5 s, disk performance 2.05 GiB/s 20 Nov 2023 12:03:22 [cta-datamover-worker-1] [] XROOTD query: /success/00005857DB6B6F504F0B80F11EBB581C2597?archiveid=3778984 from 131.169.98.55:41514 20 Nov 2023 12:03:23 [ForkJoinPool.commonPool-worker-1] [] Files /var/lib/dcache/pools/data/00005857DB6B6F504F0B80F11EBB581C2597 checksum after restore: 1:c6285b0d 20 Nov 2023 12:03:23 [ForkJoinPool.commonPool-worker-1] [] Successful restored from 131.169.98.55:41514 : 00005857DB6B6F504F0B80F11EBB581C2597 : archive id: 3778984 20 Nov 2023 12:03:24 [ForkJoinPool.commonPool-worker-1] [] Staged 00005857DB6B6F504F0B80F11EBB581C2597 from nearline storage.
added 31 commits
-
39b806d3...7f9cb606 - 12 commits from branch
main
- 7f9cb606...4f303404 - 9 earlier commits
- 18ffd411 - added temporary log of report URL
- 7f4d49f2 - removed temporary log of the report URL
- d7288d26 - Update analysis.gitlab-ci.yml
- 1fcb4a1f - Update version.hpp.in
- 5f087c3f - cleaning RetrieveRequest.cpp
- 15eee71b - update comments in OStoreDB.cpp
- 18ce695e - changed instantiation of unique_ptr
- 35bb8ccc - switched from typedef to using, removed this
- ad8af97c - Merge branch '154-update-RetrieveJob-to-support-completion-report' of...
- 27fdb592 - Update ReleaseNotes.md
Toggle commit list-
39b806d3...7f9cb606 - 12 commits from branch
added 30 commits
-
27fdb592...53feb695 - 2 commits from branch
main
- 53feb695...727e4951 - 18 earlier commits
- 1f05870e - Allow temporary failure of cppcheck
- ebd00fe9 - added temporary log of report URL
- ac300931 - removed temporary log of the report URL
- f4e6394b - Update ReleaseNotes.md
- 5ae63e0f - added temporary log of report URL
- 3e7bbb1c - removed temporary log of the report URL
- 53b8ef64 - Update analysis.gitlab-ci.yml
- 32d62891 - Update version.hpp.in
- dbe7dc57 - Update ReleaseNotes.md
- 69827369 - Merge branch '154-update-RetrieveJob-to-support-completion-report' of...
Toggle commit list-
27fdb592...53feb695 - 2 commits from branch
added 1 commit
- ea233226 - Fix CI pipeline not progressing by limiting hostname size
added 7 commits
-
ea233226...0931597b - 3 commits from branch
main
- 89b50663 - Resolve "Rework catalogue release procedure and deployment path"
- a93a71b9 - Allow temporary failure of cppcheck
- 0427fa9d - Resolve: #154 (closed) - Update RetrieveJob to support completion report
- db3ce502 - Fix CI pipeline not progressing by limiting hostname size
Toggle commit list-
ea233226...0931597b - 3 commits from branch
added 2 commits
- f91b3422 - Resolve: #154 (closed) - Update RetrieveJob to support completion report
- 6b3bb528 - Fix CI pipeline not progressing by limiting hostname size
CI pipline is not progressing due to too long hostname, look for details in issues: #504 (closed) and #511