Overlap between selection algorithms

I've created an algorithm that calculates the overlap among all selection algorithms(ParticleRangeFilters and NBodyCombiners). This algorithm is detailed in this Rec branch.

The method extracts a string id from a particle that passes a selection algorithm. This id contains the name of its inputs and which index the input is located at, Ex: "ParticleRangeFilter_64adf3:5FunctionalParticleMaker_ae3465:8", these ids can then be compared between algorithms to find the number of overlapping outputs. The algorithm iterates a 2d histogram with the number of overlaps and saves it into json. I divided the overlap into two categories: Subset means that all candidates from one algorithm are contained within another, and partial overlap where more than half are shared but not all.

From this I've created two csv files containing for each detected overlap the names of the algorithms, the rates, overlap and timing info:

subset_df_filtered_n100000.csv

partialoverlap_df_filtered_n100000.csv.

And a dictionary mapping algorithms to lines, cuts, inputs, etc... : algDict_n100000.json

Some general numbers:

I ran over 100k events
6850 Filters and Combiners
12396 total overlaps(includes the same algorithm overlapping with multiple others).
Filtered number of algs that overlap: 1124, where 1015 are subsets to another algorithm and 521 have partial overlap with another algorithm
I require >100 overlaps in the subset case and >500 in the partial overlap one.
These algs covers around ~10%(all selections are ~28%) of the HLT2 throughput. These values are taken while running the overlap algorithm so there might be some uncertainty.

Now, fixing this is not so easy. Ideally, we would do this optimization once for the full trigger but rewriting the trigger lines after initialization is very hard(I tried). Resolving the subsets would mean creating a new algorithm that filters from the superset one using only the cuts that differ between them.

FYI: @mvesteri

Edited Apr 09, 2025 by Daniel Magdalinski