Skip to content

[RTA/DPA BW tests]: Add Sprucing jobs for Turbo and TurCal (and many generalisations/simplifications as this necessitated)

Ross John Hunter requested to merge rjhunter-bw-tests-spruce-pass-job into master

FYI @lugrazet @shunan

Goes with lhcb/Moore!3285 (merged) and lhcb-core/LHCbPR2HD!284 (merged)

Closes #17 (closed)

This huge MR is all in the aim of adding two new sprucing jobs (on Turbo and TurCal) for the hlt2_and_spruce and spruce_latest bandwidth tests.

The main refactoring that this incurred was to now allow multiple STREAM_CONFIGs for each PROCESS - before now the STREAM_CONFIG was determined by the PROCESS. Now we have 3 STREAM_CONFIGs for sprucing, and all 3 related tests run in spruce_latest and hlt2_and_spruce.

Description of the changes are hidden in the sections below. I also added a round of comments to the diff to try and explain it given its size.

Simplifications/generalisations made whilst here (tried to factorise them in !404 (closed), but it got too intertwined)
  • Simplify the generation and download of the on-the-fly sprucing config yaml files. (sorts out this discussion in #13)
  • Put the list of lines for TISTOS in the on-the-fly config file. Now need to only generate and download 1 file.
  • Remove nested Gaudi job that works out the rate denominator. Now we have a second job, after the main trigger job, that explicitly does that and only that.
  • The job above is in read_event_numbers.py, which is a renaming of list-event-numbers.py. It now has two modes: one for counting up events in the input file (up to EVTMAX); and the other for storing all event numbers in an output file. The latter is for overlaps,
  • Rename generate_hlt2_fullstream_metadata.py -> generate_spruce_input_configs.py to make use case more clear.
  • Introduce a tmp/to_eos/ directory into which we put everything that the handler needs to grab and put under current_hlt2_output on eos. This means we don't need to align paths here and in the handler.
  • Trim down the list of helper functions,
  • Remove a load of fancy stuff in run_bandwidth_test_jobs that no-one has run since Ella left,
  • make_bandwidth_test_page.py: no more hard-code string templates - now build everything with functions using f-strings for cleaner substitutions,
  • make_bandwidth_test_page.py: lots more modularization/factorization of functions - I pointed stuff out in the large diff
  • make_bandwidth_test_page.py: lots more (although probs not total) usage of type hints. This is just for ease of autocompletion and code highlighting. You also get some more clarity for free
  • Removal of last vestiges of hard-coded string paths in the html file names
Summary of changes required to add the two new sprucing jobs
  • In many places you'll now see stream_config being passed around with process,
  • The new streaming configurations "wgpass" and "turcal" have been introduced for their respective sprucing jobs,
  • message.txt -> message.json for much cleaner reading/writing,
  • make_bandwidth_test_page.py is now called once by the top-level script (e.g. Moore_hlt2_bandwidth.sh) - it has an internal loop over processes. This fixes this follow-up in #13.
  • A few aesthetic clean ups on the html pages, trying to make them easier to read,
  • Each bandwidth test now has a top-level landing html page, and links to the per-test pages within.
Other improvements which were unblocked as a result of above
  • Error handling should now be more robust: we now save the cumulative error code of each Moore_bandwidth_test.sh call to message.json before making the html page. This means you should get Gitlab messages of failure now before the HTML page-maker fails.
  • We now have an automated bar chart for all streams to disk that appears on the top-level page.
Follow-ups
  • I think stream_config can now be a member of FileNameHelper - this would require lots of changes but would be a nice simplification,
  • Line descriptors: still need to use the same streaming configuration as the rest of the test. We can load up the stream config JSON and use it to filter the lines by name to ensure this,
  • hrefs can be handled better in the html maker - we can write a helper in bandwidth_helpers for this,
  • Try to fix error code for make_bandwidth_test_page.py and then use it in the handler for more error checking
  • The spruce stream configs should be ["full", "turbo", "turcal"] - changes required in lots of places, but this would obviate lots of mapping in the pages.

Draft pages with updated numbers can be found at https://rjhunter-bandwidth.web.cern.ch/

TODO:

  • Re-do the test job from scratch and rebuild docs
  • Test the handler changes as thoroughly as you can 😨
Edited by Ross John Hunter

Merge request reports