setup gpu profile for time sliced nodes
We will need to change the gpu setup to use the operator, and pass the appropriate configuration for slicing config and later MIG.
flags:
renameByDefault: true
migStrategy: "none"
failOnInitError: true
nvidiaDriverRoot: "/"
plugin:
passDeviceSpecs: false
deviceListStrategy: "envvar"
deviceIDStrategy: "uuid"
gfd:
oneshot: false
noTimestamp: false
outputFile: /etc/kubernetes/node-feature-discovery/features.d/gfd
sleepInterval: 60s
sharing:
timeSlicing:
resources:
- name: nvidia.com/gpu
replicas: 4
Rename by default should gives a new resource type nvidia.com/gpu.shared
which will be handy instead of the annotation. We need to validate this works well, if not we fallback to the annotation.
If we name this t4:
helm install nvdp nvdp/nvidia-device-plugin ...
--set-file config.map.t4shared=t4shared-config.yaml
we can then simply label the nodes to get this config:
kubectl label node \
--overwrite \
--selector=nvidia.com/gpu.product=T4-?? \
nvidia.com/device-plugin.config=t4shared
MIG will have a similar method with a different config.
Edited by Ricardo Rocha