Fix device detection to align with nvidia-smi
The CUDA calls were using the default ordering which is some buultin heuristic. nvidia-smi
orders by PCI bus ID, which is more predictable and makes things consistent between the commands.
The monitoring aggregation thread also needed a set_device
call when selection a device other than 0
and not using CUDA_VISIBLE_DEVICES
.
An optimization to increase the number of hardware connections was missed when running in production or on MEPs.
Edited by Roel Aaij