Skip to content

Delayed selections

Daniel Hugo Campora Perez requested to merge dcampora_delayed_fn_exec_lines into master

This MR implements delayed selection algorithm execution to improve its performance and scalability. It maintains the same configurability as of now.

  • Selections are executed in the posterior Gather Selections algorithm, in a single kernel.
  • Performance drastically improves. Up to 100 lines have been tested with a performance impact of about 6% with respect to 1 line.
  • All selection algorithm initializations have been moved to the kernel execution.
  • All selection algorithm copies have been moved to GatherSelections.
  • Separable compilation is now enabled, enabled by default.
  • An option to compile with / without separable compilation has been added. If separable compilation is disabled, a custom "unity" build is instantiated which joins all source files of the selections library.
  • HIP does not support separable compilation at the moment, and hence must be compiled with separable compilation disabled (set at configuration time automatically).
  • Code-generation of a new file ExternLines.cuh is also necessary (unfortunately) to allow invoking a function defined in a separate compilation unit (see https://forums.developer.nvidia.com/t/consistency-of-functions-pointer/29325/6).

Done in collaboration with @ahennequ.

TODO:

  • Optimize performance
  • CPU compatibility
  • Manage lifetime of objects used in selections
  • Bring back monitoring functionality
  • HIP build
  • HIP runs

Performance of hlt1_pp_default:

Device-averaged speedup: 1.0685950709991698
               % change: 6.859507099916984
NVIDIA RTX A5000  speedup (% change): 1.040242194593474 (4.024219459347389%)
NVIDIA RTX A6000  speedup (% change): 1.1261653368244064 (12.616533682440645%)
AMD EPYC 7502 32-Core  speedup (% change): 0.9943645196516324 (-0.5635480348367583%)
NVIDIA GeForce RTX 2080 Ti  speedup (% change): 1.0576478278517496 (5.76478278517496%)
NVIDIA GeForce RTX 3090  speedup (% change): 1.1245554760745868 (12.455547607458684%)
Edited by Daniel Hugo Campora Perez

Merge request reports

Loading