Skip to content

Make TopMatrixImpl iterative and move to header

Bernhard Manfred Gruber requested to merge topmatrix into master

This MR turns the implementation of TopMatrixImpl from a recursion into an iteration. This reduces the register footprint of the function in CUDA a fair bit.

However, I have also observed, without several other optimizations of other MRs, that this implementation might be slighly slower. It might also be worthwhile to compare both implementations on the CPU.

  • Depends on !900 (merged)
  • Benchmark this MR on GPU
  • Benchmark this MR on CPU
Edited by Bernhard Manfred Gruber

Merge request reports