[bugfix] Use float2half in CPU version of forward seeding
This MR fixes an issue in the CPU version of the forward tracking algorithm, which would in its triplet seeding use a chi2 from a half
number. In the default CPU version, half_t
is float
, hence 32 bits instead of 16. This MR corrects for this case and improves the CPU forward tracking efficiency to be on par with the CUDA version (slightly better due to the use of float
elsewhere).
This MR also includes a fix for the sort mechanism in LFTripletKeepBest
which would use an oddeven_merge_sort
. A non-conclusive test showed under rare circumstances oddeven_merge_sort
would not sort the array properly. The better performing and less prone to error insertion sort is used now instead, which leads to a less shared memory consumption and a measured speedup of +2.5%.
NVIDIA GeForce RTX 3090 │████████████████████████████████ 161.14 kHz (1.00x)
NVIDIA RTX A5000 │█████████████████████████ 127.03 kHz (1.02x)
AMD EPYC 7502 32-Core │███ 16.31 kHz (0.99x)
┼─┴─┼─┴─┼─┴─┼─┴─┼─┴─┼─┴─┼─┴─┼─┴─┼─┴─┼
0 20 40 60 80 100 120 140 160 180