Skip to content

Perform parameter scan of __launch_bounds__ and number of threads for every algorithm

CUDA allows to define __launch_bounds__ individually for each algorithm. This may have an impact in the performance of the application.

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#launch-bounds

Similarly, a proper parameter scan should be done over the number of threads of every kernel.

Edited by Daniel Hugo Campora Perez