Karbon errors can't see anything wrong in Prism Element

  • 18 June 2021
  • 1 reply

Userlevel 1

I have a Dev Nutanix Cluster that I use to host Kubernetes clusters.  Today I added a large Kuberentes cluster to it (I intended to add it to my Prod Nutanix Cluster, but accidentally added to my Dev Nutanix Cluster).

I started to get a lot of alerts from 2 of the 3 Kubernetes clusters hosted on the Dev Nutanix Cluster.  As soon as the install was complete, I deleted the large Kubernetes cluster from the Dev Nutanix Cluster.

I figured that since the resources have all been freed up, the cluster would be fine.  But I am still getting errors when using `kubectl`.

The two most common are:

Error from server: etcdserver: request timed out


Unable to connect to the server: dial tcp connectex: No connection could be made because the target machine actively refused it. is my control plane host.

I have tried to reboot both the etcd server and my control plane host.  That did not fix the issue.

I loaded up the Dev Nutanix Cluster in Prim Element and it does not give any errors (only a few warnings that are around unsupported snapshot features and such). 

It shows that there is plenty of Memory, CPU and Disk Space available.

Karbon was showing several error alerts, but now just has a header saying: Alerts: Prometheus failed to fetch alerts

How can I go about fixing my Kubernetescluster?



Best answer by JoseNutanix 25 June 2021, 09:15

View original

This topic has been closed for comments

1 reply

Userlevel 4
Badge +5

Hi @Vaccano

I see you have restarted your instances with no luck. Not sure if you followed the following order:

  • Shutdown the VMs
  • Start etcd first (wait at least 2 minutes)
  • Start control plane (wait at least 1 minute)
  • Start the workers

If this doesn’t solve the problem, I suggest you open a support case.