Skip to content

Draft: Use CUDA's dynamic parallelism for dispatching line functions

Arthur Marius Hennequin requested to merge ahennequ_lines2 into 2024-patches

Closes #503

Depends on !1456 (merged)

No throughput difference compared to !1456 (merged) (or maybe slightly faster):

NVIDIA GeForce RTX 3090    │█████████████████████████████████████      124.64 kHz
NVIDIA RTX A5000           │█████████████████████████████              99.16 kHz
NVIDIA GeForce RTX 2080 Ti │████████████████████████                   82.53 kHz
AMD EPYC 7502 32-Core      │███                                        10.67 kHz
                           ┼──┴──┼──┴──┼──┴──┼──┴──┼──┴──┼──┴──┼──┴──┼
                           0     20    40    60    80   100   120   140  

TODO:

Edited by Arthur Marius Hennequin

Merge request reports