Skip to content

[bugfix] Use float2half in CPU version of forward seeding

Daniel Campora Perez requested to merge dcampora_fix_cpu_forward into master

This MR fixes an issue in the CPU version of the forward tracking algorithm, which would in its triplet seeding use a chi2 from a half number. In the default CPU version, half_t is float, hence 32 bits instead of 16. This MR corrects for this case and improves the CPU forward tracking efficiency to be on par with the CUDA version (slightly better due to the use of float elsewhere).

This MR also includes a fix for the sort mechanism in LFTripletKeepBest which would use an oddeven_merge_sort. A non-conclusive test showed under rare circumstances oddeven_merge_sort would not sort the array properly. The better performing and less prone to error insertion sort is used now instead, which leads to a less shared memory consumption and a measured speedup of +2.5%.

NVIDIA GeForce RTX 3090 │████████████████████████████████     161.14 kHz (1.00x)
NVIDIA RTX A5000        │█████████████████████████            127.03 kHz (1.02x)
AMD EPYC 7502 32-Core   │███                                  16.31 kHz (0.99x)
                        ┼─┴─┼─┴─┼─┴─┼─┴─┼─┴─┼─┴─┼─┴─┼─┴─┼─┴─┼
                        0   20  40  60  80 100 120 140 160 180 
Edited by Daniel Campora Perez

Merge request reports