Forgot to post, the pods status:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-7f66766f7f-nd8sx 1/1 Running 1 74m
kube-system calico-node-2ctb4 1/1 Running 0 74m
kube-system calico-node-7fx7n 1/1 Running 0 74m
kube-system calico-node-bvct7 1/1 Running 1 74m
kube-system calico-node-fjwjp 0/1 CrashLoopBackOff 23 74m
kube-system calico-node-xth2k 1/1 Running 0 74m
kube-system calico-typha-6bfd55df7-ptc7d 1/1 Running 0 74m
kube-system kube-apiserver-karbon-rgs-pa-k8-cluster-staging-e77682-k8s-master-0 3/3 Running 0 77m
kube-system kube-apiserver-karbon-rgs-pa-k8-cluster-staging-e77682-k8s-master-1 3/3 Running 0 77m
kube-system kube-proxy-ds-dsd5v 1/1 Running 0 74m
kube-system kube-proxy-ds-gnng4 1/1 Running 0 74m
kube-system kube-proxy-ds-ph68q 1/1 Running 0 74m
kube-system kube-proxy-ds-tf4ml 1/1 Running 0 74m
kube-system kube-proxy-ds-whbpl
Hi Igor,
The operation is timing out. You’ll have to check if there is enough bandwidth between sites to pull the images.
Also, you can check the logs for the pod calico-node-fjwjp and see if it downloaded the image, and if it did, then why Calico is crashing.
Hi,
Yes, bandwidth is just fine … did some basic testing and all K8 based VMs initialised just fine. It’s just weird that his particular pod can’t initialise Calico network hence the Karbon deployment fails. The Karvon cluster is not removed though (automatically) so there is a chance to look around.
For the pod calico-node-fjwjp
kube-system calico-node-fjwjp 0/1 CrashLoopBackOff 327 19h
It’s constantly restarting as one would expect as readiness state is not reached.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 12m (x2224 over 19h) kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Failed to stat() nodename file: stat /var/lib/calico/nodename: no such file or directory
Warning BackOff 2m46s (x3945 over 19h) kubelet Back-off restarting failed container
Full output from pod describe:
Name: calico-node-fjwjp
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: karbon-rgs-pa-k8-cluster-staging-e77682-k8s-worker-0/10.20.25.73
Start Time: Sun, 17 Oct 2021 11:47:36 +0000
Labels: controller-revision-hash=547955649b
k8s-app=calico-node
pod-template-generation=1
Annotations: scheduler.alpha.kubernetes.io/critical-pod:
Status: Running
IP: 10.20.25.73
IPs:
IP: 10.20.25.73
Controlled By: DaemonSet/calico-node
Init Containers:
upgrade-ipam:
Container ID: docker://025878de4f3ab420bdc8d572c1037ff591c892f32b1607c1f60f523c398db8de
Image: quay.io/karbon/cni:v3.14.0
Image ID: docker-pullable://quay.io/karbon/cni@sha256:cc951ccd15aa8c94b1b3eec673e434853f3bf8c2deb83bdb4a3f934c68e0e8ae
Port: <none>
Host Port: <none>
Command:
/opt/cni/bin/calico-ipam
-upgrade
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 17 Oct 2021 11:47:45 +0000
Finished: Sun, 17 Oct 2021 11:47:45 +0000
Ready: True
Restart Count: 0
Environment:
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
Mounts:
/host/opt/cni/bin from cni-bin-dir (rw)
/var/lib/cni/networks from host-local-net-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-x5lvc (ro)
install-cni:
Container ID: docker://455ed002c1d8450e362fca773854f54000022d29a11401c3943d00d691060827
Image: quay.io/karbon/cni:v3.14.0
Image ID: docker-pullable://quay.io/karbon/cni@sha256:cc951ccd15aa8c94b1b3eec673e434853f3bf8c2deb83bdb4a3f934c68e0e8ae
Port: <none>
Host Port: <none>
Command:
/install-cni.sh
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 17 Oct 2021 11:47:47 +0000
Finished: Sun, 17 Oct 2021 11:47:47 +0000
Ready: True
Restart Count: 0
Environment:
CNI_CONF_NAME: 10-calico.conflist
CNI_NETWORK_CONFIG: <set to the key 'cni_network_config' of config map 'calico-config'> Optional: false
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CNI_MTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
SLEEP: false
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-x5lvc (ro)
flexvol-driver:
Container ID: docker://68f392f6d3bde62f14185fb50c6b4109982bd63ac060ccbadc18522e84fdc60b
Image: quay.io/karbon/pod2daemon-flexvol:v3.14.0
Image ID: docker-pullable://quay.io/karbon/pod2daemon-flexvol@sha256:e5f2c2b9e67ec463ef5b538b8bf10453cc6a6538f7288a4760ee925c51498e7d
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 17 Oct 2021 11:47:51 +0000
Finished: Sun, 17 Oct 2021 11:47:51 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/host/driver from flexvol-driver-host (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-x5lvc (ro)
Containers:
calico-node:
Container ID: docker://96fa1881578bd5bae774a6f25ffc108882413ef44acb6c8e450cf6b38345aa8d
Image: quay.io/karbon/node:v3.14.0
Image ID: docker-pullable://quay.io/karbon/node@sha256:1a643541c4d76ea412dde19454bfada5a7e03e7cbb51ddf76def9baf84bdad7c
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Mon, 18 Oct 2021 07:41:15 +0000
Finished: Mon, 18 Oct 2021 07:42:24 +0000
Ready: False
Restart Count: 327
Requests:
cpu: 250m
Liveness: exec [/bin/calico-node -felix-live] delay=10s timeout=1s period=10s #success=1 #failure=6
Readiness: exec [/bin/calico-node -felix-ready -bird-ready] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
DATASTORE_TYPE: kubernetes
FELIX_TYPHAK8SSERVICENAME: <set to the key 'typha_service_name' of config map 'calico-config'> Optional: false
WAIT_FOR_DATASTORE: true
NODENAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
CLUSTER_TYPE: k8s,bgp
IP: autodetect
CALICO_IPV4POOL_IPIP: Never
IP_AUTODETECTION_METHOD: interface=eth.*
FELIX_IPINIPMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
CALICO_IPV4POOL_CIDR: 172.20.0.0/16
CALICO_ADVERTISE_CLUSTER_IPS: 172.19.0.0/16
CALICO_DISABLE_FILE_LOGGING: true
FELIX_DEFAULTENDPOINTTOHOSTACTION: ACCEPT
FELIX_IPV6SUPPORT: false
FELIX_LOGSEVERITYSCREEN: info
FELIX_HEALTHENABLED: true
FELIX_PROMETHEUSGOMETRICSENABLED: false
FELIX_PROMETHEUSMETRICSENABLED: true
Mounts:
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/var/lib/calico from var-lib-calico (rw)
/var/run/calico from var-run-calico (rw)
/var/run/nodeagent from policysync (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-x5lvc (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
var-run-calico:
Type: HostPath (bare host directory volume)
Path: /var/run/calico
HostPathType:
var-lib-calico:
Type: HostPath (bare host directory volume)
Path: /var/lib/calico
HostPathType:
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
cni-bin-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/hyperkube/opt/cni/bin
HostPathType:
cni-net-dir:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
host-local-net-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/cni/networks
HostPathType:
policysync:
Type: HostPath (bare host directory volume)
Path: /var/run/nodeagent
HostPathType: DirectoryOrCreate
flexvol-driver-host:
Type: HostPath (bare host directory volume)
Path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
HostPathType: DirectoryOrCreate
calico-node-token-x5lvc:
Type: Secret (a volume populated by a Secret)
SecretName: calico-node-token-x5lvc
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: :NoSchedule op=Exists
:NoExecute op=Exists
CriticalAddonsOnly op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 14m (x2224 over 19h) kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Failed to stat() nodename file: stat /var/lib/calico/node
name: no such file or directory
Warning BackOff 4m24s (x3945 over 19h) kubelet Back-off restarting failed container
Hi Igor,
I suggest you open a ticket with support, so they can investigate why this pod is crashing (saw calico-node-bvct7 crashing once too). Two of three containers in this pod are alive, with calico-node crashing not finding /var/lib/calico/nodename. Usually this is sort of issues are related to network/performance issues.
Hi Jose,
Yes, that’s fine - just to figure out how to raise the ticket for support, as I never had a pleasure to use it in the past
Yes, it seems something is off with the particular worker node (10.20.25.73) and pods belonging there and communicating via kubelet and not strictly from calico nodes:
igor.stankovic@rgs-pa-bastion-1:~$ kubectl -n kube-system logs -f kube-proxy-ds-whbpl
Error from server: Get "https://10.20.25.73:10250/containerLogs/kube-system/kube-proxy-ds-whbpl/kube-proxy?follow=true": dial tcp 10.20.25.73:10250: i/o timeout
igor.stankovic@rgs-pa-bastion-1:~$
We tried to reboot the kubelet, docker then full recycle for the VM node but still the same.
It would be interesting to hear from the support.