Skip to content

Added tests for RTX 30 series.

Daniel Campora Perez requested to merge dcampora_add_rtx_30_series into master

This MR adds the RTX 30 series to the list of GPUs to be tested, and adds them to the CI.

Changes:

  • Removed cudaDeviceSetCacheConfig(cudaFuncCachePreferL1) optimization for RTX 30 series, since it made them much slower. Now that optimization is only enabled for the Turing architecture (7.5), where it yields a slight speedup.
  • Switched to nsys profile instead of nvprof (which is not supported on 30 series onwards). Breakdown is now calculated with the RTX 3090. A csv is generated from nsys profile which is easier to parse. Consequently, extract_algo_breakdown.py has been partially rewritten.
  • Update readme.md to use the most recent release of LCG, LCG98.
Edited by Daniel Campora Perez

Merge request reports