Tiller pending start due to Network Plugin not ready
From: https://cern.service-now.com/service-portal?id=ticket&n=INC3366228
openstack coe cluster show --fit 79bbdf4b-5cad-48e4-bc95-2316c5f42f0d
+----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| status | CREATE_COMPLETE |
| health_status | None |
| cluster_template_id | 7bc29005-b782-4e70-8593-0de4e8ed9e67 |
| node_addresses | ['188.184.75.68', '188.184.72.28', '188.184.103.95', '188.184.96.37', '188.185.120.127'] |
| uuid | 79bbdf4b-5cad-48e4-bc95-2316c5f42f0d |
| stack_id | 81dd77ec-616d-4061-b758-222e8b4a2351 |
| status_reason | None |
| created_at | 2023-01-27T10:08:22+00:00 |
| updated_at | 2023-01-27T10:14:17+00:00 |
| coe_version | v1.25.3-cern.0 |
| labels | {'oidc_enabled': 'false', 'kubecontroller_options': '--feature-gates=', 'logging_producer': 'lhcb-cert-dirac', 'kube_tag': 'v1.25.3-cern.0', 'snapshot_controller_enabled': 'true', |
| | 'containerd_tarball_sha256': 'b665b43e652517aac30d68ce7062d4171897476df57fe1cffb48ac8305ac2221', 'cgroup_driver': 'cgroupfs', 'oidc_issuer_url': |
| | 'https://auth.cern.ch/auth/realms/cern', 'admission_control_list': 'ExtendedResourceToleration,NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSecond |
| | s,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota,Priority', 'use_podman': 'true', 'helm_client_tag': 'v2.16.6', 'grafana_admin_passwd': 'xxx', 'ingress_controller': |
| | 'false', 'manila_version': 'v0.3.0', 'cvmfs_csi_enabled': 'true', 'autoscaler_tag': 'v1.22.0', 'monitoring_enabled': 'true', 'metrics_server_enabled': 'true', 'kubeapi_options': '-- |
| | feature-gates=CSIMigration=true', 'etcd_tag': 'v3.4.13', 'ignition_version': '3.3.0', 'nginx_ingress_controller_tag': 'v1.0.4', 'cephfs_csi_enabled': 'true', 'eos_enabled': 'true', |
| | 'oidc_username_claim': 'cern_upn', 'cephfs_csi_version': 'cern-csi-1.0-3', 'traefik_ingress_controller_tag': '2.5.4', 'influx_grafana_dashboard_enabled': 'false', |
| | 'containerd_tarball_url': 'https://s3.cern.ch/cri-containerd-release/cri-containerd-cni-1.6.9-1-linux-amd64.tar.gz', 'container_runtime': 'containerd', 'calico_ipv4pool_ipip': |
| | 'Always', 'cloud_provider_tag': 'v1.24.5', 'heapster_enabled': 'false', 'cern_chart_version': '0.12.0', 'tiller_enabled': 'true', 'cern_chart_enabled': 'true', 'oidc_groups_prefix': |
| | 'cern_egroup:', 'manila_enabled': 'true', 'ip_family_policy': 'single_stack', 'nvidia_gpu_tag': '35-5.16.13-200.fc35.x86_64-470.82.00', 'logging_installer': 'helm', 'calico_tag': |
| | 'v3.24.5', 'nvidia_gpu_enabled': 'false', 'cloud_provider_enabled': 'true', 'oidc_username_prefix': 'cern_uid:', 'calico_ipv4pool': '10.100.0.0/16', 'kube_csi_enabled': 'true', |
| | 'kube_csi_version': 'cern-csi-1.0-2', 'kubelet_options': '--feature-gates= --system-reserved=memory=500Mi --resolv-conf=/run/systemd/resolve/resolv.conf', 'manila_csi_enabled': 'true', |
| | 'container_infra_prefix': 'registry.cern.ch/magnum/', 'cvmfs_csi_version': 'v2.0.0', 'tiller_tag': 'v2.16.6', 'oidc_groups_claim': 'cern_roles', 'coredns_tag': '1.8.7', |
| | 'heat_container_agent_tag': 'train-stable-6'} |
| labels_overridden | {'ingress_controller': 'traefik'} |
| labels_skipped | {} |
| labels_added | {'grafana_admin_passwd': 'xxx', 'logging_producer': 'lhcb-cert-dirac', 'monitoring_enabled': 'true'} |
| faults | |
| keypair | id_rsa |
| api_address | https://188.185.124.135:6443 |
| master_addresses | ['188.185.124.135'] |
| create_timeout | 60 |
| node_count | 5 |
| discovery_url | https://discovery.etcd.io/16cae0c125f1ac8ca6418aeacd07c572 |
| master_count | 1 |
| container_version | 1.12.6 |
| name | lhcb-cert-dirac |
| master_flavor_id | m2.medium |
| flavor_id | m2.medium |
| health_status_reason | {} |
| project_id | ea414adb-cb40-45d0-b86c-c59a08b1a9f6 |
+----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Using a public template provided by us:
| 7bc29005-b782-4e70-8593-0de4e8ed9e67 | kubernetes-1.25.3-1 |
The cluster seems to completly be missing the installation of the helm chart:
helm list -A
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
Jan 27 10:11:40 lhcb-cert-dirac-cpgpndilx6am-master-0 podman[1644]: Writing File: /srv/magnum/kubernetes/helm/cern-magnum.yaml
Jan 27 10:11:40 lhcb-cert-dirac-cpgpndilx6am-master-0 podman[1644]: Finished running cern-chart
Jan 27 10:11:40 lhcb-cert-dirac-cpgpndilx6am-master-0 podman[1644]: START: install-helm-modules
Jan 27 10:11:40 lhcb-cert-dirac-cpgpndilx6am-master-0 podman[1644]: Waiting for Kubernetes API...
Jan 27 10:11:40 lhcb-cert-dirac-cpgpndilx6am-master-0 podman[1644]: Applying /srv/magnum/kubernetes/helm/cern-magnum.yaml.
Jan 27 10:11:40 lhcb-cert-dirac-cpgpndilx6am-master-0 podman[1644]: configmap/cern-magnum-config created
Jan 27 10:11:40 lhcb-cert-dirac-cpgpndilx6am-master-0 podman[1644]: job.batch/install-cern-magnum-job created
Jan 27 10:11:40 lhcb-cert-dirac-cpgpndilx6am-master-0 podman[1644]: Finished running install-helm-modules
Jan 27 10:11:40 lhcb-cert-dirac-cpgpndilx6am-master-0 podman[1644]: Starting to run admin role binding creation for user
Jan 27 10:11:40 lhcb-cert-dirac-cpgpndilx6am-master-0 podman[1644]: [2023-01-27 10:11:40,253] (heat-config) [INFO] deploy_stderr
They are there....
ka get cm,job
NAMESPACE NAME DATA AGE
default configmap/kube-root-ca.crt 1 3h9m
kube-node-lease configmap/kube-root-ca.crt 1 3h9m
kube-public configmap/kube-root-ca.crt 1 3h9m
kube-system configmap/extension-apiserver-authentication 6 3h10m
kube-system configmap/k8s-keystone-auth-policy 1 3h9m
kube-system configmap/kube-root-ca.crt 1 3h9m
magnum-tiller configmap/cern-magnum-config 2 3h9m
magnum-tiller configmap/kube-root-ca.crt 1 3h9m
NAMESPACE NAME COMPLETIONS DURATION AGE
magnum-tiller job.batch/install-cern-magnum-job 0/1 3h9m 3h9m
kn magnum-tiller describe job install-cern-magnum-job
Name: install-cern-magnum-job
Namespace: magnum-tiller
Selector: controller-uid=6158e89d-e970-4072-adce-03b9536f521c
Labels: controller-uid=6158e89d-e970-4072-adce-03b9536f521c
job-name=install-cern-magnum-job
Annotations: batch.kubernetes.io/job-tracking:
Parallelism: 1
Completions: 1
Completion Mode: NonIndexed
Start Time: Fri, 27 Jan 2023 11:11:40 +0100
Pods Statuses: 0 Active (0 Ready) / 0 Succeeded / 6 Failed
Pod Template:
Labels: controller-uid=6158e89d-e970-4072-adce-03b9536f521c
job-name=install-cern-magnum-job
Service Account: tiller
Containers:
config-helm:
Image: registry.cern.ch/magnum/helm-client:v3.2.0
Port: <none>
Host Port: <none>
Command:
bash
Args:
/opt/magnum/install-cern-magnum.sh
Environment:
HELM_HOME: /helm_home
TILLER_NAMESPACE: magnum-tiller
HELM_TLS_ENABLE: true
Mounts:
/etc/helm from helm-client-certs (rw)
/opt/magnum/ from install-cern-magnum-config (rw)
Volumes:
install-cern-magnum-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: cern-magnum-config
Optional: false
helm-client-certs:
Type: Secret (a volume populated by a Secret)
SecretName: helm-client-secret
Optional: false
Events: <none>
Tiller seems to be stalled:
ka get po
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system k8s-keystone-auth-hshmj 1/1 Running 0 3h11m
magnum-tiller tiller-deploy-5bc784966d-qrdst 0/1 ContainerCreating 0 3h11m
kn magnum-tiller describe po tiller-deploy-5bc784966d-qrdst
Name: tiller-deploy-5bc784966d-qrdst
Namespace: magnum-tiller
Priority: 0
Node: lhcb-cert-dirac-cpgpndilx6am-master-0/188.185.124.135
Start Time: Fri, 27 Jan 2023 11:11:38 +0100
Labels: app=helm
name=tiller
pod-template-hash=5bc784966d
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/tiller-deploy-5bc784966d
Containers:
tiller:
Container ID:
Image: registry.cern.ch/magnum/tiller:v2.16.6
Image ID:
Ports: 44134/TCP, 44135/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Liveness: http-get http://:44135/liveness delay=1s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:44135/readiness delay=1s timeout=1s period=10s #success=1 #failure=3
Environment:
TILLER_NAMESPACE: magnum-tiller
TILLER_HISTORY_MAX: 0
TILLER_TLS_VERIFY: 1
TILLER_TLS_ENABLE: 1
TILLER_TLS_CERTS: /etc/certs
Mounts:
/etc/certs from tiller-certs (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-czw6t (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
tiller-certs:
Type: Secret (a volume populated by a Secret)
SecretName: tiller-secret
Optional: false
kube-api-access-czw6t:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: node-role.kubernetes.io/master=
Tolerations: :NoSchedule op=Exists
:NoExecute op=Exists
CriticalAddonsOnly op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning NetworkNotReady 2m20s (x5702 over 3h12m) kubelet network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Calico is installed with the cern-magnum anyways. and the network cni is not needed here, is it? @rbritoda
Edited by Diogo Filipe Tomas Guerra