Skip to content
Snippets Groups Projects

Delayed selections

Merged Daniel Hugo Campora Perez requested to merge dcampora_delayed_fn_exec_lines into master

This MR implements delayed selection algorithm execution to improve its performance and scalability. It maintains the same configurability as of now.

  • Selections are executed in the posterior Gather Selections algorithm, in a single kernel.
  • Performance drastically improves. Up to 100 lines have been tested with a performance impact of about 6% with respect to 1 line.
  • All selection algorithm initializations have been moved to the kernel execution.
  • All selection algorithm copies have been moved to GatherSelections.
  • Separable compilation is now enabled, enabled by default.
  • An option to compile with / without separable compilation has been added. If separable compilation is disabled, a custom "unity" build is instantiated which joins all source files of the selections library.
  • HIP does not support separable compilation at the moment, and hence must be compiled with separable compilation disabled (set at configuration time automatically).
  • Code-generation of a new file ExternLines.cuh is also necessary (unfortunately) to allow invoking a function defined in a separate compilation unit (see https://forums.developer.nvidia.com/t/consistency-of-functions-pointer/29325/6).

Done in collaboration with @ahennequ.

TODO:

  • Optimize performance
  • CPU compatibility
  • Manage lifetime of objects used in selections
  • Bring back monitoring functionality
  • HIP build
  • HIP runs

Performance of hlt1_pp_default:

Device-averaged speedup: 1.0685950709991698
               % change: 6.859507099916984
NVIDIA RTX A5000  speedup (% change): 1.040242194593474 (4.024219459347389%)
NVIDIA RTX A6000  speedup (% change): 1.1261653368244064 (12.616533682440645%)
AMD EPYC 7502 32-Core  speedup (% change): 0.9943645196516324 (-0.5635480348367583%)
NVIDIA GeForce RTX 2080 Ti  speedup (% change): 1.0576478278517496 (5.76478278517496%)
NVIDIA GeForce RTX 3090  speedup (% change): 1.1245554760745868 (12.455547607458684%)
Edited by Daniel Hugo Campora Perez

Merge request reports

Approved by

Merged by Daniel Hugo Campora PerezDaniel Hugo Campora Perez 2 years ago (May 12, 2022 1:45pm UTC)

Merge details

  • Changes merged into master with 528c0278.
  • Deleted the source branch.

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
Please register or sign in to reply
Loading