Added tests for RTX 30 series.
This MR adds the RTX 30 series to the list of GPUs to be tested, and adds them to the CI.
Changes:
- Removed
cudaDeviceSetCacheConfig(cudaFuncCachePreferL1)
optimization for RTX 30 series, since it made them much slower. Now that optimization is only enabled for the Turing architecture (7.5), where it yields a slight speedup. - Switched to
nsys profile
instead ofnvprof
(which is not supported on 30 series onwards). Breakdown is now calculated with the RTX 3090. Acsv
is generated fromnsys profile
which is easier to parse. Consequently,extract_algo_breakdown.py
has been partially rewritten. - Update readme.md to use the most recent release of LCG, LCG98.
Edited by Daniel Hugo Campora Perez