CaloRecGPU: Optimizations
Significant optimizations for cluster growing and especially cluster splitting, with an improvement on the order of 40% of the previous algorithmic execution time.
Major changes:
- Cluster growing:
- Pair creation optimized by prefix-summing over the warp for the two kinds of neighbours to reduce atomic additions. (Though the impact seems quite small, did the compiler optimize this automagically?)
- Cluster splitting:
- Pair creation optimized by prefix-summing over the warp for the four different kinds of neighbours to reduce atomic additions.
- Main tag propagation greatly optimized by changing the propagation and change checking logic to allow for just two sets of operations per iteration: the propagation itself and then tag updates. In this new logic, the continue flag is set during tag propagation, allowing the tag updates to occur at the same time the stopping criterion is checked (so we only need two synchronization points per iteration instead of the four we had before).
Minor changes:
- The kernel launch parameter optimization service now also accepts/requires a maximum number of threads, for when we know the kernel will not benefit from greater parallelism (e. g. no need for more threads than calorimeter cells for calculating cluster properties).
- Added consistent order to limited neighbour options.
- Added the possibility to create and fill a list of all possible pairs. (Testing showed it was not a feasible approach, might be nonetheless useful in the future for other testing.)
- Improved cluster matching procedures and included optional output to diagnose cell differences.