Hi there,
We are currently deploying (and testing) Karbon as K8 orchestration platform for all our Nutanix platforms (worldwide). I have failed installation attempts from Prism Central.
As a side note, we are using VPN Site2Site from centralised Prism Control and I can reach remote K8 VLAN from distant Nutanix deployment. The reachout tests were done with small VM belonging the the K8 VLAN testing node and via private CIDR (bidirectional tests), but utilizing the same encrypted VPN channel.
This is what I’m having from karbon_core.out (PCVM):
2021-10-13T21:17:12.687Z ssh.go:153: 3DEBUG] Gk8s_cluster=RGS-PA-K8-STAGING] On 10.20.25.130:22 executing: docker plugin inspect nutanix
2021-10-13T21:17:12.825Z ssh.go:166: 6WARN] Nk8s_cluster=RGS-PA-K8-STAGING] Run cmd failed: Failed to run command: on host(10.20.25.130:22) cmd(docker plugin inspect nutanix) error: "Process exited with status 1", output: "Error: No such plugin: nutanix\nx]\n"
2021-10-13T21:17:12.825Z sshutils.go:44: 4ERROR] Rk8s_cluster=RGS-PA-K8-STAGING] Unable to run commands ddocker plugin inspect nutanix] on 10.20.25.130: "Failed to configure with SSH: Failed to run command: on host(10.20.25.130:22) cmd(docker plugin inspect nutanix) error: \"Process exited with status 1\""
2021-10-13T21:17:12.825Z install.go:55: 5INFO] Ok8s_cluster=RGS-PA-K8-STAGING] Unable to check if the dvp is enabled on: 10.20.25.130 failed to check the dvp status: Failed to configure with SSH: Failed to run command: on host(10.20.25.130:22) cmd(docker plugin inspect nutanix) error: "Process exited with status 1"
2021-10-13T21:17:12.825Z install.go:67: 7INFO] Ok8s_cluster=RGS-PA-K8-STAGING] Installing ntnx dvp on host: 10.20.25.130
2021-10-13T21:17:12.825Z install.go:150: 0DEBUG] Gk8s_cluster=RGS-PA-K8-STAGING] Command: mkdir -p /etc/docker-plugin-certs && /home/nutanix/docker_plugin/create_plugin_from_tar.sh '/home/nutanix/docker_plugin/dvp.tar.gz' 'nutanix' '10.20.1.10' '10.20.1.50' '' '' 'RGS-PA-K8'
2021-10-13T21:17:12.993Z ssh.go:138: 8DEBUG] Gk8s_cluster=RGS-PA-K8-STAGING] Copying /etc/docker-plugin-certs/key to 10.20.25.130:22
2021-10-13T21:17:13.157Z ssh.go:138: 8DEBUG] Gk8s_cluster=RGS-PA-K8-STAGING] Copying /etc/docker-plugin-certs/cert to 10.20.25.130:22
2021-10-13T21:17:13.262Z ssh.go:138: 8DEBUG] Gk8s_cluster=RGS-PA-K8-STAGING] Copying /etc/docker-plugin-certs/ca.pem to 10.20.25.130:22
2021-10-13T21:17:13.367Z ssh.go:138: 8DEBUG] Gk8s_cluster=RGS-PA-K8-STAGING] Copying /var/nutanix/host_upgrade/preupgrade-docker-plugin-certs.sh to 10.20.25.130:22
I1013 14:21:57.379644 1 forwarder.go:328] Forwarding MetricDataSampleList to CFS: timestamp_usecs:1634159700000000 entity_type_name:"acs_stats_table" metric_list:"pc_cluster_uuid" metric_list:"k8s_cluster_uuid" metric_list:"last_val_karbon_version" metric_list:"last_val_cluster_name" metric_list:"last_val_cluster_prefix" metric_list:"last_val_k8s_version" metric_list:"last_val_os_flavor" metric_list:"last_val_etcd_cluster_uuid" metric_list:"last_val_etcd_members_count" metric_list:"last_val_per_etcd_cpu" metric_list:"last_val_per_etcd_mem" metric_list:"last_val_master_deploy_type" metric_list:"last_val_masters_count" metric_list:"last_val_per_master_cpu" metric_list:"last_val_per_master_mem" metric_list:"last_val_workers_count" metric_list:"last_val_per_worker_cpu" metric_list:"last_val_per_worker_mem" metric_list:"last_val_logging_state" metric_list:"last_val_logging_version" metric_list:"last_val_fluentbit_version" metric_list:"last_val_elasticsearch_version" metric_list:"last_val_elasticsearh_image" metric_list:"last_val_kibana_version" metric_list:"last_val_kibana_image" metric_list:"last_val_proxy_used" num_dimensions:2 ]
I1013 14:21:57.385927 1 forwarder.go:339] Received response from PutMetricDataArg:
2021-10-13T21:22:08.925Z sshutils.go:44: lERROR] :k8s_cluster=RGS-PA-K8-STAGING] Unable to run commands mkdir -p /etc/docker-plugin-certs && /home/nutanix/docker_plugin/create_plugin_from_tar.sh '/home/nutanix/docker_plugin/dvp.tar.gz' 'nutanix' '10.20.1.10' '10.20.1.50' '' '' 'RGS-PA-K8'] on 10.20.25.15: "Operation timed out"
2021-10-13T21:22:08.925Z install.go:177: lERROR] :k8s_cluster=RGS-PA-K8-STAGING] Error installing ntnx dvp with err: Operation timed out t]
2021-10-13T21:22:08.925Z etcd_scale.go:40: lERROR] :k8s_cluster=RGS-PA-K8-STAGING] Failed to install the ntnx dvp on etcd node: 10.20.25.15 with err: failed to deploy the ntnx dvp: Operation timed out
2021-10-13T21:22:08.925Z node_pool_create.go:337: eERROR] :k8s_cluster=RGS-PA-K8-STAGING] Failure in init vm callback: failed to deploy the ntnx dvp: Operation timed out
2021-10-13T21:22:08.925Z node_pool_create.go:251: eDEBUG] :k8s_cluster=RGS-PA-K8-STAGING] Cleaning up failed VM "rgs-pa-k8-staging-d2d032-etcd-2" and its entities
It’s a bit unclear to me where the failure happened. If the DVP plug-in is not present, the next command sequence is part of the installation, but its not clear what failed the installation.
As a note, I did try to use DVP plugin from a testing VM and I could mount the storage container as a volume OK. That node is belonging to the same K8 VLAN when installation failed.
Thanks for any feedback!!
Igor