Solving the overlap technical problem
To go with LHCb!4840 (merged) and LHCbIntegrationTests!93 (merged)
Summary
The technical overlap problem refers to the fact that
- If an event fires line_charm and line_rd then both the charm stream and rd stream end up with
/Event/line_charm/particles
and/Event/line_rd/particles
This is because the current streaming does not open the DstData bank and simply "passes" it through.
To remove this technical overlap we need to know exactly what each Hlt2Line requested at run time in the pit. Note that lines have a huge degree of selective persistence... extra_outputs
, selective rawbanks, extras like the calo
objects, persistreco
...
This is now achieved by dumping a custom line attributes
dict into a json in the HLT2 job by invoking write_streams_attributes_to_json
and output_streams_attributes_file
which defaults to line_attribute_dict.json
. This is handed to the Sprucing job with input_streams_attributes_file
.
The Sprucing then makes it possible to remove the overlap. Sprucing makes a corresponding SpruceLine for every Hlt2Line based on this dictionary and can stream according to regexes as before
Tests
Test Hlt2Conf.sprucing.test_spruce_turbooverlap
- Prequisite
sprucing.test_hlt2_foroverlapcheck
that, usingoptions.write_streams_attributes_to_json
, writes outget_streams_attributes
to json that looks like
{"default": {"Hlt2Lineone_extraoutputs": "{\"name\": \"Hlt2Lineone_extraoutputs\", \"extra_outputs\": [\"LongerTracks\", \"LongTracks\"], \"persistreco\": true, \"tagging_particles\": false, \"calo_digits\": true, \"calo_clusters\": true, \"pv_tracks\": true, \"track_ancestors\": true, \"raw_banks\": [\"Rich\"]}", "Hlt2Linetwo": "{\"name\": \"Hlt2Linetwo\", \"extra_outputs\": [], \"persistreco\": false, \"tagging_particles\": false, \"calo_digits\": false, \"calo_clusters\": false, \"pv_tracks\": false, \"track_ancestors\": false, \"raw_banks\": []}"}}
- This json is picked up and loaded in
sprucing_tests:spruce_overlap
. The Spruce versions of the HLT2 lines are created based on this, includingextra_outputs
and streams defined by regexes -
Hlt2Conf.sprucing.test_spruce_turbooverlap_check_stream{one,two}
then check that onlystreamone_lines
candidates end up in thestreamone
stream and vice versa. Along with lots of other checks for persistreco objects, raw_banks etc.
Next steps
-
All functionality present -
Clean/refactor -
Add tests for number of events for each line - compare stdout of HLT2 and Sprucing -
Add tests for things like calodigits
-
Squash commits and rebase -
Remove print statements -
Test on real example - use BW tests? -
Some lines do not produce output particles in the form /Event/HLT2/Hlt2QEE_MDS_BDT_nHits/Particles
and so the_make_pass_spruceline
fails. Can just include the line.output_producer in the line dict and skip this alg if needed -
Decide how to deal with the downstream (DaVinci) changes to data access. Consensus so far is not to hide these changes from analysts but get them to configure things the right way and have warnings if things "look" wrong. Follow up in issue #901 -
Write up explanation in this description -
Check about rb by stream - do any WGs do this? From https://gitlab.cern.ch/lhcb/Moore/-/blob/master/Hlt/Hlt2Conf/python/Hlt2Conf/settings/hlt2_pp_2024.py?ref_type=heads#L182 no they do not. Added comment to make sure people are aware