Fix CPU performance
This MR fixes the CPU performance and changes the way the CI does CPU runs.
- CMAKE_CXX_FLAG variables were CACHE STRING and this made the build not pick up the options (introduced in !400 (merged)). This has been reverted.
- CPU builds are now NUMA-aware. CPU runners should be of the type: ":cpu:".
- GPU runs are always numactl'ed to NUMA domain 0.
- Builds where warnings are errors have been added as separate builds with no retry that allow failures, and that are not relied upon in next CI steps.
- Updated default compiler to CUDA 11. Now C++17 is supported in the CI.
- Updated default throughput run options to better performing and more reproducible ones:
-n 500 -t 16 -m 500 -r 1000
. - Updated readme.md to reflect these changes.
This is built on top of !404 (merged).
Throughput after this MR:
Quadro RTX 6000 │██████████████████████████████████████████████ 154.17 kHz
GeForce RTX 2080 Ti │████████████████████████████████████████████ 146.78 kHz
Tesla V100-PCIE-32GB │██████████████████████████████████████████ 140.57 kHz
AMD EPYC 7502 32-Core Processor │███ 13.13 kHz
Intel Xeon E5-2630 v4 │█ 3.69 kHz
┼──┴──┼──┴──┼──┴──┼──┴──┼──┴──┼──┴──┼──┴──┼──┴──┼
0 20 40 60 80 100 120 140 160
Edited by Daniel Hugo Campora Perez