Clustering
Multiple things spring to mind when going through the clustering code:
- It's slow when working at low threshold values
- Part of the reason is that, whilst looping over hits the loop is always finished over all remaining hits (beyond the cluster seed hit). Cutting it short could be done, if hits are sorted in increasing column numbers and we don't span across rows (usually people only look at one strip segment) - Would need to do that in a strip specific clustering implementation?
- Even if sorted data is used and we work one-dimensional, the check for whether a hit is neighbouring to cluster hits currently works through the cluster in the order that the hits were added. However, going through a sorted list of hits would encourage to check the cluster in reverse, as that would be quicker to find the adjacent hit and thus end the search with a
return true
.
- Checking whether a hit belongs to a cluster within constraints will assign the split cluster property if found to be distant but within range. However, that flag is not fixed if another hit is added that "completes" an unsplit cluster, i.e. with a distance of 2 allowed, checking hits in channels 1, 3 and then 2 will lead to a cluster with hits in channel 1,2,3 and still being called a split cluster!