Sets max device connections to number of threads, to a max of 32. Adds V100 to CI.
This MR does two things:
- Sets the number of device connections to the number of threads through the environment variable
CUDA_DEVICE_MAX_CONNECTIONSto be equal to the number of threads when launching the application inCUDAdevice target. The maximum is 32. - Adds the
V100to the nightly tests.