Skip to content
Snippets Groups Projects

Resolve "Update RetrieveJob to support completion report"

Summary

Requires manual tests in pre-production

References

Closes #154 (closed)

Merge request reports

Checking pipeline status.

Approved by

Merged by Joao AfonsoJoao Afonso 1 year ago (Jan 18, 2024 4:36pm UTC)

Merge details

  • Changes merged into main with ab7ca4f9 (commits were squashed).
  • Deleted the source branch.

Pipeline #6752085 failed

Pipeline failed for ab7ca4f9 on main

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Jorge Camarero Vera
  • Jorge Camarero Vera
  • Jacek Maksymilian Chodak resolved all threads

    resolved all threads

  • added 2 commits

    • e71bb9bd - changed instantiation of unique_ptr
    • 39b806d3 - switched from typedef to using, removed this

    Compare with previous version

  • Joao Afonso requested review from @poliverc

    requested review from @poliverc

  • I have been taking a look at the MR. Regarding the changes I did to the naming of the queues everything should be ok and not affect this (or anything else).

    Also the reason why the system tests are failing is because, now, after the successful retrieves instead of deleting the job the code queues another 10k jobs to be reported for success and the tests themselves are not designed to wait for this, so when the following test (client_multiple_retrieve.sh) tries to archive 4 files the drive is busy working on this success reports, the request is queued but not handled in time so the test times out.

    [root@ctacli /]# cta-admin dr ls
    library   drive    host desired  request   status since    vid   tapepool vo files   data MB/s session priority activity age reason                   
    VDSTK11 VDSTK11 tpsrv01      Up Retrieve Transfer   599 V01007 ctasystest vo 10000 153.6M  0.3       9        0        -  13 -                        
    VDSTK12 VDSTK12 tpsrv02    Down        -     Down  1696      -          -  -     -      -    -       -        0        -   5 [cta-taped] INFO Startup 
    
    [root@ctacli /]# cta-admin sq
              type   tapepool vo library    vid files queued data queued oldest youngest priority min age read max drives write max drives cur. mounts cur. files cur. data tapes capacity files on tapes data on tapes full tapes writable tapes 
    ArchiveForUser ctasystest vo       -      -            4        1.5K     52       52        1       1               1                1           0          0         0           3.5T          10002        153.6M          0              7 
          Retrieve ctasystest vo VDSTK11 V01007        10000      153.6M    316      214        1       1               1                1           1      10000    153.6M         500.0G          10002        153.6M  

    This behaviour should be conditional and only run with dCache, as this introduces extra work that EOS+CTA doesn't need.

  • Manual testing at DESY with dcache-cta-0.11 driver. The reporting observed as expected:

    cta-taped:

    Nov 20 11:55:30 dcache-enstore01 cta-taped: LVL="INFO" PID="29657" TID="29657" MSG="In Scheduler::reportRetrieveJobsBatch(): report URL." SubprocessName="maintenanceHandler" fileId="3778984" reportType="CompletionReport" reportURL="eosQuery://dcache-lab007.desy.de:42917/success/00005857DB6B6F504F0B80F11EBB581C2597?archiveid=3778984"
    Nov 20 11:55:30 dcache-enstore01 cta-taped: LVL="INFO" PID="29657" TID="29657" MSG="In Scheduler::reportRetrieveJobsBatch(): report successful." SubprocessName="maintenanceHandler" fileId="3778984" reportType="CompletionReport"

    dcache:

    20 Nov 2023 12:02:55 [cta-datamover-worker-2] [] Request /var/lib/dcache/pools/data/00005857DB6B6F504F0B80F11EBB581C2597 scheduling time: 101 s
    20 Nov 2023 12:02:55 [cta-datamover-worker-2] [] Opening /var/lib/dcache/pools/data/00005857DB6B6F504F0B80F11EBB581C2597 for writing from 131.169.98.55:41518
    20 Nov 2023 12:03:16 [cta-datamover-worker-2] [] Closing file /var/lib/dcache/pools/data/00005857DB6B6F504F0B80F11EBB581C2597 from 131.169.98.55:41518. Transferred 2.00 GiB in 21.5 s, disk performance 2.05 GiB/s
    20 Nov 2023 12:03:22 [cta-datamover-worker-1] [] XROOTD query: /success/00005857DB6B6F504F0B80F11EBB581C2597?archiveid=3778984 from 131.169.98.55:41514
    20 Nov 2023 12:03:23 [ForkJoinPool.commonPool-worker-1] [] Files /var/lib/dcache/pools/data/00005857DB6B6F504F0B80F11EBB581C2597 checksum after restore: 1:c6285b0d
    20 Nov 2023 12:03:23 [ForkJoinPool.commonPool-worker-1] [] Successful restored from 131.169.98.55:41514 : 00005857DB6B6F504F0B80F11EBB581C2597 : archive id: 3778984
    20 Nov 2023 12:03:24 [ForkJoinPool.commonPool-worker-1] [] Staged 00005857DB6B6F504F0B80F11EBB581C2597 from nearline storage.
    
  • added 31 commits

    Compare with previous version

  • added 30 commits

    Compare with previous version

  • added 1 commit

    • ea233226 - Fix CI pipeline not progressing by limiting hostname size

    Compare with previous version

  • added 7 commits

    • ea233226...0931597b - 3 commits from branch main
    • 89b50663 - Resolve "Rework catalogue release procedure and deployment path"
    • a93a71b9 - Allow temporary failure of cppcheck
    • 0427fa9d - Resolve: #154 (closed) - Update RetrieveJob to support completion report
    • db3ce502 - Fix CI pipeline not progressing by limiting hostname size

    Compare with previous version

  • added 2 commits

    • f91b3422 - Resolve: #154 (closed) - Update RetrieveJob to support completion report
    • 6b3bb528 - Fix CI pipeline not progressing by limiting hostname size

    Compare with previous version

  • CI pipline is not progressing due to too long hostname, look for details in issues: #504 (closed) and #511

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading