Added tests for RTX 30 series. (!449) · Merge requests · LHCb / Allen

This MR adds the RTX 30 series to the list of GPUs to be tested, and adds them to the CI.

Changes:

Removed cudaDeviceSetCacheConfig(cudaFuncCachePreferL1) optimization for RTX 30 series, since it made them much slower. Now that optimization is only enabled for the Turing architecture (7.5), where it yields a slight speedup.
Switched to nsys profile instead of nvprof (which is not supported on 30 series onwards). Breakdown is now calculated with the RTX 3090. A csv is generated from nsys profile which is easier to parse. Consequently, extract_algo_breakdown.py has been partially rewritten.
Update readme.md to use the most recent release of LCG, LCG98.

Added tests for RTX 30 series.