Use virtual architecture to setup default CUDA_ARCH

This allows to set CUDA_NVCC_FLAGS containing something like
-code=sm_75 to compile not to PTX, but to binary code for a
particular GPU architecture. This is useful to avoid large
startup times due to the CUDA runtime compiling the PTX just
in time later.
4 jobs for master in 7 minutes and 41 seconds (queued for 1 second)