setup gpu profile for time sliced nodes

We will need to change the gpu setup to use the operator, and pass the appropriate configuration for slicing config and later MIG.

flags:
  renameByDefault: true
  migStrategy: "none"
  failOnInitError: true
  nvidiaDriverRoot: "/"
  plugin:
    passDeviceSpecs: false
    deviceListStrategy: "envvar"
    deviceIDStrategy: "uuid"
  gfd:
    oneshot: false
    noTimestamp: false
    outputFile: /etc/kubernetes/node-feature-discovery/features.d/gfd
    sleepInterval: 60s
sharing:
  timeSlicing:
    resources:
    - name: nvidia.com/gpu
      replicas: 4

Rename by default should gives a new resource type nvidia.com/gpu.shared which will be handy instead of the annotation. We need to validate this works well, if not we fallback to the annotation.

If we name this t4:

helm install nvdp nvdp/nvidia-device-plugin ...
    --set-file config.map.t4shared=t4shared-config.yaml

we can then simply label the nodes to get this config:

kubectl label node \
   --overwrite \
   --selector=nvidia.com/gpu.product=T4-?? \
    nvidia.com/device-plugin.config=t4shared

MIG will have a similar method with a different config.

Edited Aug 12, 2022 by Ricardo Rocha