Skip to content

Fix nvidia_gpu_enabled breaking helm deployment

Diogo Filipe Tomas Guerra requested to merge urgent-gpu-fix into master
  • Re-add Hack to prelabel the gpu nodes with mig/config:disabled to fix nvidia-device-plugin-daemonset crashloops
  • Drop restartPolicy on seLinux daemonset: Looks like this passed through and restartPolicy: OnFailure is not working. There is conflicting documentation:

[1] https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#pod-template

[2] https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.25/#podspec-v1-core

This is halting cern-magnum metachart from installing the rest of the components.

Closes: #23 (closed)

Edited by Diogo Filipe Tomas Guerra

Merge request reports