Improvements to prefix sum, sorting algorithms and VELO clustering by Arthur.
This MR is a rebased version of !1509 (closed).
!1509 (closed).
Description copied fromIntroduce a new test and benchmark to compare different implementations of the prefix_sum:
- cpu1: default implementation of host_prefix_sum
- cuda1: blelloch's scan implementation using 1 element per thread
- cuda2: blelloch's scan implementation using 4 element per thread
- cuda3: blelloch's scan implementation using a single kernel, sliding on the array
Closes #500 (closed)
Implements a new sorting algorithm.
Implements a new velo clustering algorithm.
Throughput Test on Real Data
The throughput on real data with varying levels of mu
shows a 10-15% increase in throughput with the HLT1 matching without and with UT sequences (hlt1_pp_matchig_no_ut
and hlt1_pp_matching
):
Checks of D0 and D+ Reconstruction
This MR and 2024-patches
have been tested on real data. The dataset is the MEP dumps of Run 297083 with mu=4 which contains a total of 29.5M events. The sequence used was hlt1_pp_matching_and_downstream
. The number of D0 and D+ reconstructed and triggered by Hlt1OneTrackMVA || Hlt1TwoTrackMVA
is identical between this MR and 2024-patches
.