Skip to content

Solving the overlap technical problem

Nicole Skidmore requested to merge overlap_nskidmor into master

To go with LHCb!4840 (merged) and LHCbIntegrationTests!93 (merged)

Summary

The technical overlap problem refers to the fact that

  • If an event fires line_charm and line_rd then both the charm stream and rd stream end up with /Event/line_charm/particles and /Event/line_rd/particles

This is because the current streaming does not open the DstData bank and simply "passes" it through.

To remove this technical overlap we need to know exactly what each Hlt2Line requested at run time in the pit. Note that lines have a huge degree of selective persistence... extra_outputs, selective rawbanks, extras like the calo objects, persistreco...

This is now achieved by dumping a custom line attributes dict into a json in the HLT2 job by invoking write_streams_attributes_to_json and output_streams_attributes_file which defaults to line_attribute_dict.json. This is handed to the Sprucing job with input_streams_attributes_file.

The Sprucing then makes it possible to remove the overlap. Sprucing makes a corresponding SpruceLine for every Hlt2Line based on this dictionary and can stream according to regexes as before

Tests

Test Hlt2Conf.sprucing.test_spruce_turbooverlap

  • Prequisite sprucing.test_hlt2_foroverlapcheck that, using options.write_streams_attributes_to_json, writes out get_streams_attributes to json that looks like
{"default": {"Hlt2Lineone_extraoutputs": "{\"name\": \"Hlt2Lineone_extraoutputs\", \"extra_outputs\": [\"LongerTracks\", \"LongTracks\"], \"persistreco\": true, \"tagging_particles\": false, \"calo_digits\": true, \"calo_clusters\": true, \"pv_tracks\": true, \"track_ancestors\": true, \"raw_banks\": [\"Rich\"]}", "Hlt2Linetwo": "{\"name\": \"Hlt2Linetwo\", \"extra_outputs\": [], \"persistreco\": false, \"tagging_particles\": false, \"calo_digits\": false, \"calo_clusters\": false, \"pv_tracks\": false, \"track_ancestors\": false, \"raw_banks\": []}"}}
  • This json is picked up and loaded in sprucing_tests:spruce_overlap. The Spruce versions of the HLT2 lines are created based on this, including extra_outputs and streams defined by regexes
  • Hlt2Conf.sprucing.test_spruce_turbooverlap_check_stream{one,two} then check that only streamone_lines candidates end up in the streamone stream and vice versa. Along with lots of other checks for persistreco objects, raw_banks etc.

Next steps

  • All functionality present
  • Clean/refactor
  • Add tests for number of events for each line - compare stdout of HLT2 and Sprucing
  • Add tests for things like calodigits
  • Squash commits and rebase
  • Remove print statements
  • Test on real example - use BW tests?
  • Some lines do not produce output particles in the form /Event/HLT2/Hlt2QEE_MDS_BDT_nHits/Particles and so the _make_pass_spruceline fails. Can just include the line.output_producer in the line dict and skip this alg if needed
  • Decide how to deal with the downstream (DaVinci) changes to data access. Consensus so far is not to hide these changes from analysts but get them to configure things the right way and have warnings if things "look" wrong. Follow up in issue #901
  • Write up explanation in this description
  • Check about rb by stream - do any WGs do this? From https://gitlab.cern.ch/lhcb/Moore/-/blob/master/Hlt/Hlt2Conf/python/Hlt2Conf/settings/hlt2_pp_2024.py?ref_type=heads#L182 no they do not. Added comment to make sure people are aware
Edited by Nicole Skidmore

Merge request reports

Loading