Skip to content

Reduce RDF graph branching on Selection (only for Filter)

What the title says... a bit of context and technical details: so far every Selection would have its own SelWithDefines helper object for the backend, which would have a RDF node, a set of defines, optionally a dictionary with systematic variations, and the weight. A consequence of this is that as soon as a selection is registered (at construction time, or when building the graph for lazy backends) the RDF graph is branched, it will not benefit from columns defined for the parent, which leads to duplication when there are multiple Selections created from the same parent.

In most circumstances that does not make a big difference because most selections have cuts, but @fbury had an interesting case of many selections with the same events, but different weights (reweighting N samples to N' targets) - which currently leads to a lot of duplication.

The solution is to split SelWithDefines into the "Filter node with extras" part, now called FilterWithDefines, and a small helper object for each Selection (with a reference to the former and a dictionary with the weight for every variation) - then two selections can share a FilterWithDefines, which reduces the cost to an additional weight column, similarly to weight-only systematic variations.

This should also make it easier to construct selections that do not inherit their parents' weights - the main problem for that is how to provide an interface for the user (adding a inheritWeight=True argument to refine would work, I guess), but I'm not sure if this is something anyone missed so far (I'm afraid the practical case is more "take some but not all weights", which would need a more complex interface).

Status: tests pass (also the regression tests), performance comparison ongoing. In most cases there should be a small reduction in the RDF graph size, and a very small increase in python memory usage (a few small dictionaries compared to the typical several GB of JITting memory), hopefully more for the case that triggered this.

Merge request reports