Skip to content

Improvements to prefix sum, sorting algorithms and VELO clustering by Arthur.

Da Yu Tou requested to merge rebase_ahennequ_scan into 2024-patches

This MR is a rebased version of !1509 (closed).

Description copied from !1509 (closed).

Introduce a new test and benchmark to compare different implementations of the prefix_sum:

image

  • cpu1: default implementation of host_prefix_sum
  • cuda1: blelloch's scan implementation using 1 element per thread
  • cuda2: blelloch's scan implementation using 4 element per thread
  • cuda3: blelloch's scan implementation using a single kernel, sliding on the array

Closes #500

Implements a new sorting algorithm.

Implements a new velo clustering algorithm.

Details in https://indico.cern.ch/event/1370609/contributions/5928258/attachments/2847533/4979150/SumSortClustHLT1.pdf

Throughput Test on Real Data

The throughput on real data with varying levels of mu shows a 10-15% increase in throughput with the HLT1 matching without and with UT sequences (hlt1_pp_matchig_no_ut and hlt1_pp_matching): image image

Checks of D0 and D+ Reconstruction

This MR and 2024-patches have been tested on real data. The dataset is the MEP dumps of Run 297083 with mu=4 which contains a total of 29.5M events. The sequence used was hlt1_pp_matching_and_downstream. The number of D0 and D+ reconstructed and triggered by Hlt1OneTrackMVA || Hlt1TwoTrackMVA is identical between this MR and 2024-patches.

image

Edited by Da Yu Tou

Merge request reports