Update default compiler and CI throughput run options
This MR updates the CI:
- Updated default compiler to CUDA 11. Now C++17 is supported in the CI.
- Updated default throughput run options to better performing and more reproducible ones:
-n 500 -t 16 -m 500 -r 1000
. - Updated readme.md to reflect these changes.