I have a Dev Nutanix Cluster that I use to host Kubernetes clusters. Today I added a large Kuberentes cluster to it (I intended to add it to my Prod Nutanix Cluster, but accidentally added to my Dev Nutanix Cluster).
I started to get a lot of alerts from 2 of the 3 Kubernetes clusters hosted on the Dev Nutanix Cluster. As soon as the install was complete, I deleted the large Kubernetes cluster from the Dev Nutanix Cluster.
I figured that since the resources have all been freed up, the cluster would be fine. But I am still getting errors when using `kubectl`.
The two most common are:
Error from server: etcdserver: request timed out
Unable to connect to the server: dial tcp 10.62.12.91:443: connectex: No connection could be made because the target machine actively refused it.
10.62.12.91 is my control plane host.
I have tried to reboot both the etcd server and my control plane host. That did not fix the issue.
I loaded up the Dev Nutanix Cluster in Prim Element and it does not give any errors (only a few warnings that are around unsupported snapshot features and such).
It shows that there is plenty of Memory, CPU and Disk Space available.
Karbon was showing several error alerts, but now just has a header saying: Alerts: Prometheus failed to fetch alerts
How can I go about fixing my Kubernetescluster?
Best answer by JoseNutanixView original