Define launch bounds instead of hard-defining number of registers
Launch bounds (https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#launch-bounds) can help the nvcc compiler determine how many registers does it have to allocate for a concrete kernel.
- Explore removing the
maxrregcount
directive fromCMakeLists.txt
and instead specify launch bounds for all algorithms, and analyze impact in performance.