Skip to content

Investigate issues with stargz with kubeconfig-based authentication

Context: https://github.com/containerd/stargz-snapshotter/issues/1989

After the issue with cri_keychain authentication was discovered, we moved to kubeconfig-based authentication.

The issue:

Create a cluster with kubeconfig-based authentication., label a node for stargz.

Outcome:

The node becomes NotReady. Restarting kubelet doesn't help. Stargz and containerd are running.

$ kubectl get no
NAME                                                STATUS     ROLES    AGE   VERSION
digaponc-stargz-upgrade-008-dodwsqzwamqt-master-0   Ready      master   12m   v1.33.1
digaponc-stargz-upgrade-008-dodwsqzwamqt-node-0     NotReady   <none>   10m   v1.33.1
$ systemctl status stargz-snapshotter.service 
● stargz-snapshotter.service - stargz snapshotter
     Loaded: loaded (/etc/systemd/system/stargz-snapshotter.service; enabled; preset: disabled)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf, 50-keep-warm.conf
     Active: active (running) since Mon 2025-07-07 09:42:03 UTC; 2min 13s ago

$ systemctl status containerd.service 
● containerd.service - containerd container runtime
     Loaded: loaded (/etc/systemd/system/containerd.service; enabled; preset: disabled)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf, 50-keep-warm.conf
     Active: active (running) since Mon 2025-07-07 09:41:11 UTC; 3min 9s ago

$ systemctl status kubelet
● kubelet.service - Kubelet via Hyperkube (System Container)
     Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; preset: disabled)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf, 50-keep-warm.conf
     Active: activating (auto-restart) (Result: exit-code) since Mon 2025-07-07 09:44:21 UTC; 6s ago

kubelet logs

Jul 07 09:45:05 digaponc-stargz-upgrade-008-dodwsqzwamqt-node-0 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Jul 07 09:45:05 digaponc-stargz-upgrade-008-dodwsqzwamqt-node-0 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Jul 07 09:45:05 digaponc-stargz-upgrade-008-dodwsqzwamqt-node-0 podman[11893]: 2025-07-07 09:45:05.876090329 +0000 UTC m=+0.166543879 container died f0aabaf4684fae369418439cc933092c44f1613e579317f994d40889f49389f4 (image=registry.cern.ch/kubernetes/hyperkube:v1.33.1-rancher1, name>
Jul 07 09:45:05 digaponc-stargz-upgrade-008-dodwsqzwamqt-node-0 bash[11893]: E0707 09:45:05.873019   11907 run.go:72] "command failed" err="failed to run Kubelet: validate service connection: validate CRI v1 image API for endpoint \"unix:///run/containerd-stargz-grpc/containerd-st>
Jul 07 09:45:05 digaponc-stargz-upgrade-008-dodwsqzwamqt-node-0 kubelet[11905]: E0707 09:45:05.873019   11907 run.go:72] "command failed" err="failed to run Kubelet: validate service connection: validate CRI v1 image API for endpoint \"unix:///run/containerd-stargz-grpc/containerd>
Jul 07 09:45:05 digaponc-stargz-upgrade-008-dodwsqzwamqt-node-0 bash[11893]: I0707 09:45:05.872075   11907 log.go:25] "Connecting to image service" endpoint="unix:///run/containerd-stargz-grpc/containerd-stargz-grpc.sock"
Jul 07 09:45:05 digaponc-stargz-upgrade-008-dodwsqzwamqt-node-0 bash[11893]: I0707 09:45:05.872062   11907 log.go:25] "Validated CRI v1 runtime API"
Edited by Diana Gaponcic