Cannot delete Pods after kubelet restart
Kubelet looses track of Pods' volumes (in /var/lib/kubelet/pods/<UID>/{volumes,volume-subpaths}
) after it's restarted. This then makes it impossible to delete any Pod on that node -- their are stuck in Terminating
state:
Nov 28 20:12:23 dir-ais-prod-multi-1-zone-a-ct6tt6u7fapx-node-6 bash[1871056]: E1128 20:12:23.837409 1871078 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/projected/ce8d5e50-c57d-4508-ab8e-fab23ee053b7-kube-api-access-7jlbm podName:ce8d5e50-c57d-4508-ab8e-fab23ee053b7 nodeName:}" failed. No retries permitted until 2023-11-28 20:14:25.837388924 +0000 UTC m=+14248.956644154 (durationBeforeRetry 2m2s). Error: "UnmountVolume.TearDown failed for volume \"kube-api-access-7jlbm\" (UniqueName: \"kubernetes.io/projected/ce8d5e50-c57d-4508-ab8e-fab23ee053b7-kube-api-access-7jlbm\") pod \"ce8d5e50-c57d-4508-ab8e-fab23ee053b7\" (UID: \"ce8d5e50-c57d-4508-ab8e-fab23ee053b7\") : unlinkat /var/lib/kubelet/pods/ce8d5e50-c57d-4508-ab8e-fab23ee053b7/volumes/kubernetes.io~projected/kube-api-access-7jlbm: device or resource busy"
I've been able to reproduce on v1.21 as well as v1.27 templates (haven't tried others, but it's possible the same thing happens there too).