Skip to content

Fix CPU performance

Daniel Campora Perez requested to merge dcampora_fix_cpu_performance into master

This MR fixes the CPU performance and changes the way the CI does CPU runs.

  • CMAKE_CXX_FLAG variables were CACHE STRING and this made the build not pick up the options (introduced in !400 (merged)). This has been reverted.
  • CPU builds are now NUMA-aware. CPU runners should be of the type: ":cpu:".
  • GPU runs are always numactl'ed to NUMA domain 0.
  • Builds where warnings are errors have been added as separate builds with no retry that allow failures, and that are not relied upon in next CI steps.
  • Updated default compiler to CUDA 11. Now C++17 is supported in the CI.
  • Updated default throughput run options to better performing and more reproducible ones: -n 500 -t 16 -m 500 -r 1000.
  • Updated readme.md to reflect these changes.

This is built on top of !404 (merged).

Throughput after this MR:

Quadro RTX 6000                 │██████████████████████████████████████████████   154.17 kHz
GeForce RTX 2080 Ti             │████████████████████████████████████████████     146.78 kHz
Tesla V100-PCIE-32GB            │██████████████████████████████████████████       140.57 kHz
AMD EPYC 7502 32-Core Processor │███                                              13.13 kHz
Intel Xeon E5-2630 v4           │█                                                3.69 kHz
                                ┼──┴──┼──┴──┼──┴──┼──┴──┼──┴──┼──┴──┼──┴──┼──┴──┼
                                0     20    40    60    80   100   120   140   160  
Edited by Daniel Campora Perez

Merge request reports