Skip to content

Tiller pending start due to Network Plugin not ready

From: https://cern.service-now.com/service-portal?id=ticket&n=INC3366228

openstack coe cluster show --fit 79bbdf4b-5cad-48e4-bc95-2316c5f42f0d
+----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field                | Value                                                                                                                                                                                    |
+----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| status               | CREATE_COMPLETE                                                                                                                                                                          |
| health_status        | None                                                                                                                                                                                     |
| cluster_template_id  | 7bc29005-b782-4e70-8593-0de4e8ed9e67                                                                                                                                                     |
| node_addresses       | ['188.184.75.68', '188.184.72.28', '188.184.103.95', '188.184.96.37', '188.185.120.127']                                                                                                 |
| uuid                 | 79bbdf4b-5cad-48e4-bc95-2316c5f42f0d                                                                                                                                                     |
| stack_id             | 81dd77ec-616d-4061-b758-222e8b4a2351                                                                                                                                                     |
| status_reason        | None                                                                                                                                                                                     |
| created_at           | 2023-01-27T10:08:22+00:00                                                                                                                                                                |
| updated_at           | 2023-01-27T10:14:17+00:00                                                                                                                                                                |
| coe_version          | v1.25.3-cern.0                                                                                                                                                                           |
| labels               | {'oidc_enabled': 'false', 'kubecontroller_options': '--feature-gates=', 'logging_producer': 'lhcb-cert-dirac', 'kube_tag': 'v1.25.3-cern.0', 'snapshot_controller_enabled': 'true',      |
|                      | 'containerd_tarball_sha256': 'b665b43e652517aac30d68ce7062d4171897476df57fe1cffb48ac8305ac2221', 'cgroup_driver': 'cgroupfs', 'oidc_issuer_url':                                         |
|                      | 'https://auth.cern.ch/auth/realms/cern', 'admission_control_list': 'ExtendedResourceToleration,NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSecond |
|                      | s,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota,Priority', 'use_podman': 'true', 'helm_client_tag': 'v2.16.6', 'grafana_admin_passwd': 'xxx', 'ingress_controller': |
|                      | 'false', 'manila_version': 'v0.3.0', 'cvmfs_csi_enabled': 'true', 'autoscaler_tag': 'v1.22.0', 'monitoring_enabled': 'true', 'metrics_server_enabled': 'true', 'kubeapi_options': '--    |
|                      | feature-gates=CSIMigration=true', 'etcd_tag': 'v3.4.13', 'ignition_version': '3.3.0', 'nginx_ingress_controller_tag': 'v1.0.4', 'cephfs_csi_enabled': 'true', 'eos_enabled': 'true',     |
|                      | 'oidc_username_claim': 'cern_upn', 'cephfs_csi_version': 'cern-csi-1.0-3', 'traefik_ingress_controller_tag': '2.5.4', 'influx_grafana_dashboard_enabled': 'false',                       |
|                      | 'containerd_tarball_url': 'https://s3.cern.ch/cri-containerd-release/cri-containerd-cni-1.6.9-1-linux-amd64.tar.gz', 'container_runtime': 'containerd', 'calico_ipv4pool_ipip':          |
|                      | 'Always', 'cloud_provider_tag': 'v1.24.5', 'heapster_enabled': 'false', 'cern_chart_version': '0.12.0', 'tiller_enabled': 'true', 'cern_chart_enabled': 'true', 'oidc_groups_prefix':    |
|                      | 'cern_egroup:', 'manila_enabled': 'true', 'ip_family_policy': 'single_stack', 'nvidia_gpu_tag': '35-5.16.13-200.fc35.x86_64-470.82.00', 'logging_installer': 'helm', 'calico_tag':       |
|                      | 'v3.24.5', 'nvidia_gpu_enabled': 'false', 'cloud_provider_enabled': 'true', 'oidc_username_prefix': 'cern_uid:', 'calico_ipv4pool': '10.100.0.0/16', 'kube_csi_enabled': 'true',         |
|                      | 'kube_csi_version': 'cern-csi-1.0-2', 'kubelet_options': '--feature-gates= --system-reserved=memory=500Mi --resolv-conf=/run/systemd/resolve/resolv.conf', 'manila_csi_enabled': 'true', |
|                      | 'container_infra_prefix': 'registry.cern.ch/magnum/', 'cvmfs_csi_version': 'v2.0.0', 'tiller_tag': 'v2.16.6', 'oidc_groups_claim': 'cern_roles', 'coredns_tag': '1.8.7',                 |
|                      | 'heat_container_agent_tag': 'train-stable-6'}                                                                                                                                            |
| labels_overridden    | {'ingress_controller': 'traefik'}                                                                                                                                                        |
| labels_skipped       | {}                                                                                                                                                                                       |
| labels_added         | {'grafana_admin_passwd': 'xxx', 'logging_producer': 'lhcb-cert-dirac', 'monitoring_enabled': 'true'}                                                                                    |
| faults               |                                                                                                                                                                                          |
| keypair              | id_rsa                                                                                                                                                                                   |
| api_address          | https://188.185.124.135:6443                                                                                                                                                             |
| master_addresses     | ['188.185.124.135']                                                                                                                                                                      |
| create_timeout       | 60                                                                                                                                                                                       |
| node_count           | 5                                                                                                                                                                                        |
| discovery_url        | https://discovery.etcd.io/16cae0c125f1ac8ca6418aeacd07c572                                                                                                                               |
| master_count         | 1                                                                                                                                                                                        |
| container_version    | 1.12.6                                                                                                                                                                                   |
| name                 | lhcb-cert-dirac                                                                                                                                                                          |
| master_flavor_id     | m2.medium                                                                                                                                                                                |
| flavor_id            | m2.medium                                                                                                                                                                                |
| health_status_reason | {}                                                                                                                                                                                       |
| project_id           | ea414adb-cb40-45d0-b86c-c59a08b1a9f6                                                                                                                                                     |
+----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Using a public template provided by us:

| 7bc29005-b782-4e70-8593-0de4e8ed9e67 | kubernetes-1.25.3-1       |

The cluster seems to completly be missing the installation of the helm chart:

helm list -A
NAME	NAMESPACE	REVISION	UPDATED	STATUS	CHART	APP VERSION
Jan 27 10:11:40 lhcb-cert-dirac-cpgpndilx6am-master-0 podman[1644]: Writing File: /srv/magnum/kubernetes/helm/cern-magnum.yaml
Jan 27 10:11:40 lhcb-cert-dirac-cpgpndilx6am-master-0 podman[1644]: Finished running cern-chart
Jan 27 10:11:40 lhcb-cert-dirac-cpgpndilx6am-master-0 podman[1644]: START: install-helm-modules
Jan 27 10:11:40 lhcb-cert-dirac-cpgpndilx6am-master-0 podman[1644]: Waiting for Kubernetes API...
Jan 27 10:11:40 lhcb-cert-dirac-cpgpndilx6am-master-0 podman[1644]: Applying /srv/magnum/kubernetes/helm/cern-magnum.yaml.
Jan 27 10:11:40 lhcb-cert-dirac-cpgpndilx6am-master-0 podman[1644]: configmap/cern-magnum-config created
Jan 27 10:11:40 lhcb-cert-dirac-cpgpndilx6am-master-0 podman[1644]: job.batch/install-cern-magnum-job created
Jan 27 10:11:40 lhcb-cert-dirac-cpgpndilx6am-master-0 podman[1644]: Finished running install-helm-modules
Jan 27 10:11:40 lhcb-cert-dirac-cpgpndilx6am-master-0 podman[1644]: Starting to run admin role binding creation for user
Jan 27 10:11:40 lhcb-cert-dirac-cpgpndilx6am-master-0 podman[1644]: [2023-01-27 10:11:40,253] (heat-config) [INFO] deploy_stderr

They are there....

ka get cm,job
NAMESPACE         NAME                                           DATA   AGE
default           configmap/kube-root-ca.crt                     1      3h9m
kube-node-lease   configmap/kube-root-ca.crt                     1      3h9m
kube-public       configmap/kube-root-ca.crt                     1      3h9m
kube-system       configmap/extension-apiserver-authentication   6      3h10m
kube-system       configmap/k8s-keystone-auth-policy             1      3h9m
kube-system       configmap/kube-root-ca.crt                     1      3h9m
magnum-tiller     configmap/cern-magnum-config                   2      3h9m
magnum-tiller     configmap/kube-root-ca.crt                     1      3h9m

NAMESPACE       NAME                                COMPLETIONS   DURATION   AGE
magnum-tiller   job.batch/install-cern-magnum-job   0/1           3h9m       3h9m
kn magnum-tiller describe job install-cern-magnum-job
Name:             install-cern-magnum-job
Namespace:        magnum-tiller
Selector:         controller-uid=6158e89d-e970-4072-adce-03b9536f521c
Labels:           controller-uid=6158e89d-e970-4072-adce-03b9536f521c
                  job-name=install-cern-magnum-job
Annotations:      batch.kubernetes.io/job-tracking: 
Parallelism:      1
Completions:      1
Completion Mode:  NonIndexed
Start Time:       Fri, 27 Jan 2023 11:11:40 +0100
Pods Statuses:    0 Active (0 Ready) / 0 Succeeded / 6 Failed
Pod Template:
  Labels:           controller-uid=6158e89d-e970-4072-adce-03b9536f521c
                    job-name=install-cern-magnum-job
  Service Account:  tiller
  Containers:
   config-helm:
    Image:      registry.cern.ch/magnum/helm-client:v3.2.0
    Port:       <none>
    Host Port:  <none>
    Command:
      bash
    Args:
      /opt/magnum/install-cern-magnum.sh
    Environment:
      HELM_HOME:         /helm_home
      TILLER_NAMESPACE:  magnum-tiller
      HELM_TLS_ENABLE:   true
    Mounts:
      /etc/helm from helm-client-certs (rw)
      /opt/magnum/ from install-cern-magnum-config (rw)
  Volumes:
   install-cern-magnum-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      cern-magnum-config
    Optional:  false
   helm-client-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  helm-client-secret
    Optional:    false
Events:          <none>

Tiller seems to be stalled:

ka get po
NAMESPACE       NAME                             READY   STATUS              RESTARTS   AGE
kube-system     k8s-keystone-auth-hshmj          1/1     Running             0          3h11m
magnum-tiller   tiller-deploy-5bc784966d-qrdst   0/1     ContainerCreating   0          3h11m
kn magnum-tiller describe po tiller-deploy-5bc784966d-qrdst
Name:           tiller-deploy-5bc784966d-qrdst
Namespace:      magnum-tiller
Priority:       0
Node:           lhcb-cert-dirac-cpgpndilx6am-master-0/188.185.124.135
Start Time:     Fri, 27 Jan 2023 11:11:38 +0100
Labels:         app=helm
                name=tiller
                pod-template-hash=5bc784966d
Annotations:    <none>
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  ReplicaSet/tiller-deploy-5bc784966d
Containers:
  tiller:
    Container ID:   
    Image:          registry.cern.ch/magnum/tiller:v2.16.6
    Image ID:       
    Ports:          44134/TCP, 44135/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Liveness:       http-get http://:44135/liveness delay=1s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:44135/readiness delay=1s timeout=1s period=10s #success=1 #failure=3
    Environment:
      TILLER_NAMESPACE:    magnum-tiller
      TILLER_HISTORY_MAX:  0
      TILLER_TLS_VERIFY:   1
      TILLER_TLS_ENABLE:   1
      TILLER_TLS_CERTS:    /etc/certs
    Mounts:
      /etc/certs from tiller-certs (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-czw6t (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  tiller-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  tiller-secret
    Optional:    false
  kube-api-access-czw6t:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              node-role.kubernetes.io/master=
Tolerations:                 :NoSchedule op=Exists
                             :NoExecute op=Exists
                             CriticalAddonsOnly op=Exists
Events:
  Type     Reason           Age                       From     Message
  ----     ------           ----                      ----     -------
  Warning  NetworkNotReady  2m20s (x5702 over 3h12m)  kubelet  network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized

Calico is installed with the cern-magnum anyways. and the network cni is not needed here, is it? @rbritoda

Edited by Diogo Filipe Tomas Guerra