Skip to content

Improvements to prefix sum and sorting algorithms

Arthur Marius Hennequin requested to merge ahennequ_scan into 2024-patches

Introduce a new test and benchmark to compare different implementations of the prefix_sum:

image

  • cpu1: default implementation of host_prefix_sum
  • cuda1: blelloch's scan implementation using 1 element per thread
  • cuda2: blelloch's scan implementation using 4 element per thread
  • cuda3: blelloch's scan implementation using a single kernel, sliding on the array

Closes #500

Implements a new sorting algorithm.

Implements a new velo clustering algorithm.

Details in https://indico.cern.ch/event/1370609/contributions/5928258/attachments/2847533/4979150/SumSortClustHLT1.pdf

FYI @gligorov @raaij @dovombru @cagapopo

Edited by Arthur Marius Hennequin

Merge request reports