Follow-up from "Sort clusters prior to correcting for shower overlap"
from @graven: there is still a order dependency in the implementation [of CaloFutureShowerOverlap
], as the assigned fractions depend on the 'current' cluster energies, and the cluster energies depend on the 'current' fractions... this is the reason I left the explicit std::sort
, and used std::partition_point
instead of just using std::partition
(or std::stable_partitition
. So at least now the problem is self-contained, and does not depend on whatever order the input has. But yes, it remains intrinsically order dependent.
from @dgolubko: As for a generalization of the algorithm about which @cmarinbe asked, I've no real progress on implementing that. What I initially had in mind was instead of cluster pairs introduce cluster groups (mutually non-overlapping groups of overlapping clusters) and then
- either extend CaloShowerOverlapTool algorithm to cluster groups (all the formulas seem to be generalizable from 2 to N clusters). There are some things with which one should be more careful than in the 2-cluster version though, e.g., the procedure of rejecting the clusters which energy gets too small or negative in the course of the iterations might need to allow for more flexibility; also due to more degrees of freedom in the n-cluster case, one should introduce a more general convergence condition for stopping the iterations.
- or try to re-formulate the problem currently solved with explicit iterations by CaloShowerOverlapTool into a binned fit of the energy distribution of N clusters to the given set of measured energies in the cells of the whole cluster group (with measurement errors derived from cell noise plus possibly some fudge factors). The cell energy fractions would then go into fit parameters (with constrains like
\sum_cell fraction_i = 1
).
However, both these 'generalized' solutions imply more combinatorics, and thus larger computational cost than the current 2-cluster overlap correction. So assuming (which is still to be better quantified though) that the results of the calo reconstruction change only relatively little w.r.t. the cluster ordering, I hope that it should be ok to choose any computationally cheap solution (or even to tolerate some level of 'non-determinism' in the reconstruction).
from @sponce : Just a note on this. When we had a look at the Calo sequence with @ahennequ in Paris, he did propose a solution for the original problem of ordering (which was known since then actually). The idea was to simply drop the overlap algo and compute the proper energy distribution on the fly when clustering. This had the double advantage of being order proof and dropping an algo that is in n^4 complexity if I remember correctly and thus that takes a long time.