Prism Central Deployment fails (Prometheus)

Question

Hi there,Trying to deploy Prism Central on Nutanix Community Edition I’m seeing the same symptoms as this one Error during deploy Prism Central Encountered Exception in post_deployment step: Failed to enable micro services infrastructure on PC: deploy msp:Error deploying addons: failed to deploy monitoring addon: failed to deploy and verify kube-prometheus: failed to verify kube-prometheus: Operation timed out: failed to verify kube-prometheus: expecting 1 available replica of k8s prometheus in ntnx-system namespace. Currently running: 0 I have tried multiple versions of PC with the same issue.Looking inside the appliance I can see that the reason for Prometheus failing is it's waiting for the persistent volume to be provisioned# Check the STS statuskubectl get sts -n ntnx-system prometheus-k8sNAME READY AGEprometheus-k8s 0/1 8h# Find the Pod namekubectl get po -n ntnx-system | grep prometheus-k8sprometheus-k8s-0 0/2 Pending 0 8h# Describe the pod (snipped)kubectl describe po -n ntnx-system prometheus-k8s-0 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 20m (x496 over 8h) default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.nutanix@NTNX-10-10-200-31-A-PCVM:~$# Check the PV Claimskubectl get pvc -n ntnx-systemNAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGEprometheus-k8s-db-prometheus-k8s-0 Pending silver 8h# Describe the PVC (snipped)kubectl describe pvc -n ntnx-system prometheus-k8s-db-prometheus-k8s-0Name: prometheus-k8s-db-prometheus-k8s-0Namespace: ntnx-systemStorageClass: silverStatus: PendingEvents: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ProvisioningFailed 20m (x125 over 8h) csi.nutanix.com_ntnx-10-10-200-31-a-pcvm_e2ef0343-d205-47d8-beb2-cd277479e2c5 failed to provision volume with StorageClass "silver": rpc error: code = Internal desc = NutanixVolumes: failed to create volume: pvc-26a1d40b-85cb-4c37-888f-389c0b7f0c66, err: NutanixVolumes: failed to create REST client, error: Max retries done: Failed to authenticate GetVersions(): 401 Authorization Error - HTTP Response Code : 401 Normal Provisioning 4m3s (x129 over 8h) csi.nutanix.com_ntnx-10-10-200-31-a-pcvm_e2ef0343-d205-47d8-beb2-cd277479e2c5 External provisioner is provisioning volume for claim "ntnx-system/prometheus-k8s-db-prometheus-k8s-0" Normal ExternalProvisioning 83s (x2102 over 8h) persistentvolume-controller waiting for a volume to be created, either by external provisioner "csi.nutanix.com" or manually created by system administratorLooks like the CSI Controller is having authentication issues Failed to authenticate GetVersions(): 401 Authorization Error - HTTP Response Code : 401If I look at the storage class configuration I can see the secret in use,kubectl get sc silver -o yamlallowVolumeExpansion: trueapiVersion: storage.k8s.io/v1kind: StorageClassmetadata: annotations: ntnxClusterRef: 00063c6c-198a-2907-510e-00a09802c300 creationTimestamp: "2025-08-18T14:33:47Z" name: silver resourceVersion: "943" uid: d57479a3-eaf5-4231-8263-ab590d4b6978parameters: csi.storage.k8s.io/controller-expand-secret-name: ntnx-csi-secret-sbwhh csi.storage.k8s.io/controller-expand-secret-namespace: ntnx-system csi.storage.k8s.io/controller-publish-secret-name: ntnx-csi-secret-sbwhh csi.storage.k8s.io/controller-publish-secret-namespace: ntnx-system csi.storage.k8s.io/fstype: "" csi.storage.k8s.io/node-publish-secret-name: ntnx-csi-secret-sbwhh csi.storage.k8s.io/node-publish-secret-namespace: ntnx-system csi.storage.k8s.io/provisioner-secret-name: ntnx-csi-secret-sbwhh csi.storage.k8s.io/provisioner-secret-namespace: ntnx-system dataServiceEndPoint: dsip.00063c6c-198a-2907-510e-00a09802c300.prism-central.cluster.local:3260 description: "" flashMode: DISABLED isSegmentedIscsiNetwork: "false" storageContainer: NutanixManagementShare storageType: NutanixVolumesprovisioner: csi.nutanix.comreclaimPolicy: DeletevolumeBindingMode: ImmediateResolving the Data Services Endpoint from within Prism Central correctly resolves my Prism Element Data Services IPping dsip.00063c6c-198a-2907-510e-00a09802c300.prism-central.cluster.localPING dsip.00063c6c-198a-2907-510e-00a09802c300.prism-central.cluster.local (10.10.200.10) 56(84) bytes of data.64 bytes from prism-data-services. (10.10.200.10): icmp_seq=1 ttl=64 time=0.921 msThe secret contains a cert and endpoint# Get the secret (snipped)kubectl get secret -n ntnx-system ntnx-csi-secret-sbwhh -o yamlapiVersion: v1data: cert: endpoint: cGVpcC4wMDA2M2M2Yy0xOThhLTI5MDctNTEwZS0wMGEwOTgwMmMzMDAucHJpc20tY2VudHJhbC5jbHVzdGVyLmxvY2FsOjk0NDA=kind: Secretmetadata: creationTimestamp: "2025-08-18T14:33:47Z" name: ntnx-csi-secret-sbwhh namespace: ntnx-system resourceVersion: "942" uid: 45701a4c-8052-46d7-af3c-c69436fdd392type: OpaqueThe endpoint is resolving to the local Prism Central IP# Decode the base64 endpointecho cGVpcC4wMDA2M2M2Yy0xOThhLTI5MDctNTEwZS0wMGEwOTgwMmMzMDAucHJpc20tY2VudHJhbC5jbHVzdGVyLmxvY2FsOjk0NDA= | base64 -dpeip.00063c6c-198a-2907-510e-00a09802c300.prism-central.cluster.local:9440# Resolve itnslookup peip.00063c6c-198a-2907-510e-00a09802c300.prism-central.cluster.localServer: 127.0.0.1Address: 127.0.0.1#53Name: peip.00063c6c-198a-2907-510e-00a09802c300.prism-central.cluster.localAddress: 10.10.200.20Where is the authentication error coming from? when it accesses the local endpoint or the remote data services IP? As the secret is generated automatically I must be missing some thing or have something mis-configured preventing authenticationAny ideas?

MAHDTech · Accepted Answer

Not sure what it was but I ended up rebuilding the entire PE cluster and re-deploying and the error is gone.

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded