Skip to content

Decode Retina into a sorted container

Daniel Hugo Campora Perez requested to merge dcampora_retina_sorted into master

This MR decodes the Retina clusters into a sorted container right away, reducing the memory pressure of the Velo chain of algorithms.

A performance increase of 32% in the velo subsequence is observed. On Ampere, the sequence as a whole gets about 4-5% faster:

NVIDIA GeForce RTX 3090    │████████████████████████████████████████           203.49 kHz (1.03x)
NVIDIA RTX A6000           │███████████████████████████████████████            196.44 kHz (1.02x)
NVIDIA RTX A5000           │████████████████████████████████████               180.01 kHz (1.02x)
NVIDIA GeForce RTX 2080 Ti │███████████████████████████                        139.05 kHz (1.04x)
                           │█████████████████                                  85.36 kHz (1.05x)
AMD EPYC 7502 32-Core      │████                                               20.45 kHz (1.07x)
                           ┼────┴────┼────┴────┼────┴────┼────┴────┼────┴────┼ (1.05x)
                           0         50       100       150       200       250     (1.05x)  

Requires !748 (merged)

Edited by Daniel Hugo Campora Perez

Merge request reports

Loading