Sets max device connections to number of threads, to a max of 32. Adds V100 to CI.
This MR does two things:
- Sets the number of device connections to the number of threads through the environment variable
CUDA_DEVICE_MAX_CONNECTIONS
to be equal to the number of threads when launching the application inCUDA
device target. The maximum is 32. - Adds the
V100
to the nightly tests.