Draft: move RD tau lines across streams
Purpose
Test how big of an impact duplication due to combination of: (multiple candidates, extra persistence for isolation, duplication of selected events across streams) in RD tau lines has and explore moving to full as an option.
The rationale is that if this effect is very big moving these lines to full would reduce significantly the turbo BW not only in the modified RD stream but also the related ones
See slides in https://indico.cern.ch/event/1394275/ for a summary
It also would buy time that would benefit the amount of data in disk dedicated to isolation
cc RD @tfulghes @mramospe @matzeni @jagoodin @mbirch @rquaglia @elsmith @egede
cc RTA-DPA @abertoli @nskidmor @mvesteri @poluekt
Important note 1: this didn't reach consensus in RD and in particular didn't quite look favorable by tau proponents and migration task force so please regard this as studies and an option I actually consider worth going for but not as a decided next step. I would like to have feedback from RTA-DPA though on whether this is reasonable from your side or am I underestimating the effect on full.
Important note 2: configurations 1 to 3 don't sit on top of !3053 (merged) so it doesn't pick the nice mitigations done there, results 4 do pick them
Results 4 - move only hadronic tau lines (after rebase to master)
!3148 (comment 7737457)
Similar to 3 with slightly different numbers
Results 3 - move only hadronic tau lines
- Full stream goes from 7.3 to 7.44 GB/s -> seems quite affordable.
- Sum of turbo streams goes from 4.2 to 3.78 GB/s -> not an incredible gain but not bad.
- From the 420 MB/s that we free from turbo only 190 come from the reduction in RD and 230 comes from freeing up the parasitic duplication -> shows how much useless information is put in disk as of current master -> nice gain, it gets rid of a big chunk of BW due to duplication which is basically a waste of resources.
- The fraction of events selected by RD and also Charm w.r.t. only RD goes from 30.4% to 28.8% -> not that much of a difference so it's safe to assume it comes to the large event sizes resulting from multiple candidates+isolation cones (not very turbo-safe approach).
- Sprucing goes from 1.39 to 1.47 GB/s, and this already accounts for having hadronic tau sprucing lines getting the candidates selected by the now turbo lines to disk (not the isolation cones). Seems a reasonable number to me.
Results 2 - move all tau lines (leptonic and hadronic)
!3148 (comment 7707133) Here all tau lines were moved, so the effect is much bigger. In this configuration sprucing lines were filtering on topo so it potentially heavily underestimates the effect in sprucing when getting some of these events to disk, even if only the candidates.
- Full stream goes from 7.3 to 8.12 GB/s -> idk if that's ok.
- Sum of turbo streams goes from 4.2 to 3.3 GB/s
- From the 900 MB/s that we free from turbo 500 come from the reduction in RD and 400 comes from freeing up the parasitic duplication -> gets rid of a big chunk of useless information put in disk as of current master.
- Sprucing goes from 1.39 to 1.54 GB/s, BUT this doesn't account for having tau sprucing lines getting all the candidates selected by the now turbo lines to disk but only the common ones with topo, so an additional non accounted effect might be expected.
Results 1 - on persistreco+extra_outputs for isolation
w.r.t. test 2 in this version I didn't switching off extra_outputs for isolation (not the way to go because they are redundant with persistreco but dind't have the time yet to change that). Because of that comparing test 1 to 2 gives an idea of how much burden having both isolation cones + persistreco w.r.t. only persistreco.
The BW didn't seem to change that much, (1.83->1.71GB/s) because apparently even though it does add redundant info the duplication of information is partially mitigated because persistreco already persists most of the relevant needed objects. Seems like a nice effect. Also compression potentially even closes the gap further IIUC
Summary
moving RD's hadronic (hadronic and leptonic) tau lines to full
- adds 0.14 (0.72) GB/s to full if sitting on master but probably less sitting on !3053 (merged)
- reduces the sum of turbo streams BW by 0.42 (0.9) GB/s (same comment about !3053 (merged))
- Reduces the parasitic BW due to duplication of information across streams sizably -> reduces waisted resources.
- Allows for better isolation information at the EOY resprucing -> buys time to optimize selection of extra outputs or even getting only the computed isolation magnitudes calculated at resprucing stage.
Perks:
- Adds burden to tape
- Adds complexity.
- Might give the wrong idea that this magically solves the multiplicity entangles with isolation problem. Efforts are already in place to mitigate that problem though so I wouldn't fear it that much.
- Other options could be studied, like moving tau lines to other turbo streams like charm or a dedicated tau steam (?), or changing the approach and combining tau lines to inclusive ones that would go to full. But these studies would take time and effort given time constrains that might not be the best place to put efforts.
- ProbNN for example is expected to also mitigate the problems a lot so we might have done this for nothing.