Hi there,
Trying to deploy Prism Central on Nutanix Community Edition I’m seeing the same symptoms as this one Error during deploy Prism Central
Encountered Exception in post_deployment step: Failed to enable micro services infrastructure on PC: deploy msp:Error deploying addons: failed to deploy monitoring addon: failed to deploy and verify kube-prometheus: failed to verify kube-prometheus: Operation timed out: failed to verify kube-prometheus: expecting 1 available replica of k8s prometheus in ntnx-system namespace. Currently running: 0
I have tried multiple versions of PC with the same issue.
Looking inside the appliance I can see that the reason for Prometheus failing is it's waiting for the persistent volume to be provisioned
# Check the STS status
kubectl get sts -n ntnx-system prometheus-k8s
NAME READY AGE
prometheus-k8s 0/1 8h
# Find the Pod name
kubectl get po -n ntnx-system | grep prometheus-k8s
prometheus-k8s-0 0/2 Pending 0 8h
# Describe the pod (snipped)
kubectl describe po -n ntnx-system prometheus-k8s-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 20m (x496 over 8h) default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
nutanix@NTNX-10-10-200-31-A-PCVM:~$
# Check the PV Claims
kubectl get pvc -n ntnx-system
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
prometheus-k8s-db-prometheus-k8s-0 Pending silver 8h
# Describe the PVC (snipped)
kubectl describe pvc -n ntnx-system prometheus-k8s-db-prometheus-k8s-0
Name: prometheus-k8s-db-prometheus-k8s-0
Namespace: ntnx-system
StorageClass: silver
Status: Pending
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProvisioningFailed 20m (x125 over 8h) csi.nutanix.com_ntnx-10-10-200-31-a-pcvm_e2ef0343-d205-47d8-beb2-cd277479e2c5 failed to provision volume with StorageClass "silver": rpc error: code = Internal desc = NutanixVolumes: failed to create volume: pvc-26a1d40b-85cb-4c37-888f-389c0b7f0c66, err: NutanixVolumes: failed to create REST client, error: Max retries done: Failed to authenticate GetVersions(): 401 Authorization Error - HTTP Response Code : 401
Normal Provisioning 4m3s (x129 over 8h) csi.nutanix.com_ntnx-10-10-200-31-a-pcvm_e2ef0343-d205-47d8-beb2-cd277479e2c5 External provisioner is provisioning volume for claim "ntnx-system/prometheus-k8s-db-prometheus-k8s-0"
Normal ExternalProvisioning 83s (x2102 over 8h) persistentvolume-controller waiting for a volume to be created, either by external provisioner "csi.nutanix.com" or manually created by system administrator
Looks like the CSI Controller is having authentication issues
Failed to authenticate GetVersions(): 401 Authorization Error - HTTP Response Code : 401
If I look at the storage class configuration I can see the secret in use,
kubectl get sc silver -o yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
ntnxClusterRef: 00063c6c-198a-2907-510e-00a09802c300
creationTimestamp: "2025-08-18T14:33:47Z"
name: silver
resourceVersion: "943"
uid: d57479a3-eaf5-4231-8263-ab590d4b6978
parameters:
csi.storage.k8s.io/controller-expand-secret-name: ntnx-csi-secret-sbwhh
csi.storage.k8s.io/controller-expand-secret-namespace: ntnx-system
csi.storage.k8s.io/controller-publish-secret-name: ntnx-csi-secret-sbwhh
csi.storage.k8s.io/controller-publish-secret-namespace: ntnx-system
csi.storage.k8s.io/fstype: ""
csi.storage.k8s.io/node-publish-secret-name: ntnx-csi-secret-sbwhh
csi.storage.k8s.io/node-publish-secret-namespace: ntnx-system
csi.storage.k8s.io/provisioner-secret-name: ntnx-csi-secret-sbwhh
csi.storage.k8s.io/provisioner-secret-namespace: ntnx-system
dataServiceEndPoint: dsip.00063c6c-198a-2907-510e-00a09802c300.prism-central.cluster.local:3260
description: ""
flashMode: DISABLED
isSegmentedIscsiNetwork: "false"
storageContainer: NutanixManagementShare
storageType: NutanixVolumes
provisioner: csi.nutanix.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
Resolving the Data Services Endpoint from within Prism Central correctly resolves my Prism Element Data Services IP
ping dsip.00063c6c-198a-2907-510e-00a09802c300.prism-central.cluster.local
PING dsip.00063c6c-198a-2907-510e-00a09802c300.prism-central.cluster.local (10.10.200.10) 56(84) bytes of data.
64 bytes from prism-data-services.<mydomain_was_here> (10.10.200.10): icmp_seq=1 ttl=64 time=0.921 ms
The secret contains a cert and endpoint
# Get the secret (snipped)
kubectl get secret -n ntnx-system ntnx-csi-secret-sbwhh -o yaml
apiVersion: v1
data:
cert: <base64 cert was here>
endpoint: cGVpcC4wMDA2M2M2Yy0xOThhLTI5MDctNTEwZS0wMGEwOTgwMmMzMDAucHJpc20tY2VudHJhbC5jbHVzdGVyLmxvY2FsOjk0NDA=
kind: Secret
metadata:
creationTimestamp: "2025-08-18T14:33:47Z"
name: ntnx-csi-secret-sbwhh
namespace: ntnx-system
resourceVersion: "942"
uid: 45701a4c-8052-46d7-af3c-c69436fdd392
type: Opaque
The endpoint is resolving to the local Prism Central IP
# Decode the base64 endpoint
echo
cGVpcC4wMDA2M2M2Yy0xOThhLTI5MDctNTEwZS0wMGEwOTgwMmMzMDAucHJpc20tY2VudHJhbC5jbHVzdGVyLmxvY2FsOjk0NDA= | base64 -d
peip.00063c6c-198a-2907-510e-00a09802c300.prism-central.cluster.local:9440
# Resolve it
nslookup peip.00063c6c-198a-2907-510e-00a09802c300.prism-central.cluster.local
Server: 127.0.0.1
Address: 127.0.0.1#53
Name: peip.00063c6c-198a-2907-510e-00a09802c300.prism-central.cluster.local
Address: 10.10.200.20
Where is the authentication error coming from? when it accesses the local endpoint or the remote data services IP?
As the secret is generated automatically I must be missing some thing or have something mis-configured preventing authentication
Any ideas?