Fix testbench HLT1 with multiple GPUs
- The architecture for running with more than 1 GPU was missing;
-
CUDA_VISIBLE_DEVICES
in the environment was ignored, which prevents things from working on some of the test machines.
Neither the nightly throughput test nor the production setup are affected.