Hi All, I’ve testing old Nutanix NX box demo with no MA support. The AOS version install is 5.10.6 I tried to create file server using wizard in prism element. In the last step of wizard when I prompt create the error message showing Error creating: Invalid CVM subnet configuration, possibly due to incomplete CVM re-ip workflow I’ve run ncc check and didn’t found subnet mismatch issue. Please kindly advice how to fix this.

Create file server failed error creating: Invalid CVM subnet Configuration

If you have the following error:

The cause here is that manually updating the CVM's netmask will not update the value of 'external_subnet' in Zeus. This prevents the Data Services IP from communicating with the FSVMs and in turn not being able to mount the zpools.

Note: The proper way to update CVM IP and/or subnet mask configuration is documented here.

Please have the following information ready :

1) Verify the CVM IP and subnet via ifconfig.

allssh "ifconfig eth0 | grep inet\ "

Example:

nutanix@cvm:~$ allssh "ifconfig eth0 | grep inet\ " ================== 10.58.85.39 ================= inet 10.58.85.39 netmask 255.255.255.0 broadcast 10.58.85.255 ================== 10.58.85.40 ================= inet 10.58.85.40 netmask 255.255.255.0 broadcast 10.58.85.255 ================== 10.58.85.41 ================= inet 10.58.85.41 netmask 255.255.255.0 broadcast 10.58.85.255

2) Verify what zookeeper has for the external_subnet:

zeus_config_printer | grep ^external_subnet

Example:

nutanix@cvm:~$ zeus_config_printer | grep ^external_subnet external_subnet: "10.58.84.0/255.255.252.0"

3) If the values are different, pleae OPEN a CASE in Nutanix Support and have a Senior SRE to correct this problem.

Hope this helps

Regs.

Hi kovitking

If you have the following error:

The cause here is that manually updating the CVM's netmask will not update the value of 'external_subnet' in Zeus. This prevents the Data Services IP from communicating with the FSVMs and in turn not being able to mount the zpools.

Note: The proper way to update CVM IP and/or subnet mask configuration is documented here.

Please have the following information ready :

1) Verify the CVM IP and subnet via ifconfig.

 allssh "ifconfig eth0 | grep inet\ "

Example:

 nutanix@cvm:~$ allssh "ifconfig eth0 | grep inet\ " ================== 10.58.85.39 ================= inet 10.58.85.39 netmask 255.255.255.0 broadcast 10.58.85.255 ================== 10.58.85.40 ================= inet 10.58.85.40 netmask 255.255.255.0 broadcast 10.58.85.255 ================== 10.58.85.41 ================= inet 10.58.85.41 netmask 255.255.255.0 broadcast 10.58.85.255

2) Verify what zookeeper has for the external_subnet:

 zeus_config_printer | grep ^external_subnet

Example:

 nutanix@cvm:~$ zeus_config_printer | grep ^external_subnet external_subnet: "10.58.84.0/255.255.252.0"

3) If the values are different, pleae OPEN a CASE in Nutanix Support and have a Senior SRE to correct this problem.

Hope this helps

Regs.

nutanix@CVM:172.16.1.32:~$ allssh "ifconfig eth0 | grep inet\ "
================== 172.16.1.33 =================
inet 172.16.1.33 netmask 255.255.254.0 broadcast 172.16.1.255
================== 172.16.1.32 =================
inet 172.16.1.32 netmask 255.255.254.0 broadcast 172.16.1.255
nutanix@CVM:172.16.1.32:~$ zeus_config_printer | grep ^external_subnet
external_subnet: "172.16.1.0/255.255.254.0"

HI Thank for your suggestion.

The value is same.

For this node I’m not able to open the case due MA has expired. This is a PoC assets.

I would recommend to upgrade the cluster to AOS 5.10.7 or 5.11.1 because restarting genesis on a two node cluster with AOS 5.10.6 could lead to cluster instabillity.

After Upgrade ty to Re-Deploy the File Server using Prism and let us know.

Meanwhile I am searching for an alternate solution.

Regs.

I would recommend to upgrade the cluster to AOS 5.10.7 or 5.11.1 because restarting genesis on a two node cluster with AOS 5.10.6 could lead to cluster instabillity.

After Upgrade ty to Re-Deploy the File Server using Prism and let us know.

Meanwhile I am searching for an alternate solution.

Regs.

Hi @AntonioG

I’m follow KB https://portal.nutanix.com/page/documents/kbs/details/?targetId=kA00e000000CrNuCAK for upgrade two-node cluster.

However, during manual upgrade is error as below message.

020-04-14 12:03:24 WARNING preupgrade_checks.py:815 Skipping replication factor check since cluster is stopped
2020-04-14 12:03:25 INFO multihome_utils.py:146 Cluster does not have multi homed CVMs
2020-04-14 12:03:25 ERROR preupgrade_checks.py:163 Cannot upgrade two node cluster when cluster has a leader fixed. Current leader svm id: 4. Try again after some time , Please refer KB 6396
2020-04-14 12:03:25 INFO preupgrade_checks.py:978 Cluster is stopped, skipping under-replication test
2020-04-14 12:03:25 INFO preupgrade_checks.py:1849 Skipping version compatibility test
2020-04-14 12:03:25 WARNING preupgrade_checks.py:772 Cluster has less than 3 nodes. Downtime possible
2020-04-14 12:03:25 ERROR cluster_upgrade.py:352 Failure in pre-upgrade tests, errors Cannot upgrade two node cluster when cluster has a leader fixed. Current leader svm id: 4. Try again after some time , Please refer KB 6396
Signature validation Error for version 5.10.7 on svm 172.16.1.32. Error: Failed to verify NOS installer signature on svm 172.16.1.32, Please refer KB 6108
2020-04-14 12:03:25 ERROR cluster:1867 Failed to perform cluster upgrade
2020-04-14 12:03:25 ERROR cluster:2815 Operation failed

I’ve checked the MD5 it’s correct.

The ERROR message:

Cannot upgrade two node cluster when cluster has a leader fixed…

means that the cluster is under-replicated.

Curator is responsible for kicking off replication for all extent groups that are not adequately replicated. A Curator full scan is needed to replicate the under-replicated data.

Solution:

Refer to KB 2826. Wait for cluster data to be rebalanced across nodes and Current Fault Tolerance to show 1.
Once the curator scan has completed, run the pre-upgrade check again. It could be that it takes a couple of scans dependent on the number of underreplicated egroups.

Regs.

Antonio

The ERROR message:

Cannot upgrade two node cluster when cluster has a leader fixed…

means that the cluster is under-replicated.

Curator is responsible for kicking off replication for all extent groups that are not adequately replicated. A Curator full scan is needed to replicate the under-replicated data.

Solution:

Refer to KB 2826. Wait for cluster data to be rebalanced across nodes and Current Fault Tolerance to show 1.
Once the curator scan has completed, run the pre-upgrade check again. It could be that it takes a couple of scans dependent on the number of underreplicated egroups.

Regs.

Antonio

Hi @AntonioG

Sorry, It’s just upgrade successfully.

But the file server deployment still failed with same error.

I need more specific information regrading your cluster, could you please add the following:

Screenshot of the Create File Server Screen from Prism
From any CVM, please provide the output from the following commands:

ncli cluster info
ncli host ls
ncli alert ls
cluster status | grep -v UP
nodetool -h 0 ring
ncli cluster get-domain-fault-tolerance-status type=node
ncc health_checks run_all

Note: For a two node cluster It is only possible to deploy one one Files FSVM with no Distributed shared.

Please also review the prerequisites for Files

Regs.

Antonio

Hi @AntonioG

As your requested.

File server creation step

Have you tried with a mathematically valid subnet?

You specified your network as 172.16.1.0 / 255.255.254.0

The network address cannot be x.x.1.0 for this subnet mask. That “1” in your third octet is a 1 in the last bit, 00000001, but that bit is masked in this configuration since your subnet mask is functionally aaaaaaaa.bbbbbbbb.cccccccX.XXXXXXXX (X being the masked bits).

172.16.1.0 can only be the network address of a 255.255.255.0 subnet or smaller.

I recognize correcting this configuration would mean shutting down the cluster and running the IP reconfig script. It also could mean some other adjustments to the network? Not sure what’s going on there.

I wouldn’t be at all surprised if this is why your network validation fails in the creation process.

Please also correct the following FATAL alarm reported :

FAIL: CVM is not uplinked to any 10Gbps nics on bridge/vSwitch br0.Node 172.16.1.32:FAIL: CVM is not uplinked to any 10Gbps nics on bridge/vSwitch br0.Refer to KB 1584 (http://portal.nutanix.com/kb/1584) for details on 10gbe_check

Follow the KB 1584 on AHV section.

This will allow to have 10gb on vSwitch on the affected host (172.16.1.32) as the AHV Network Best Practices recommend.

Regs,

Antonio

Have you tried with a mathematically valid subnet?

You specified your network as 172.16.1.0 / 255.255.254.0

The network address cannot be x.x.1.0 for this subnet mask. That “1” in your third octet is a 1 in the last bit, 00000001, but that bit is masked in this configuration since your subnet mask is functionally aaaaaaaa.bbbbbbbb.cccccccX.XXXXXXXX (X being the masked bits).

172.16.1.0 can only be the network address of a 255.255.255.0 subnet or smaller.

I recognize correcting this configuration would mean shutting down the cluster and running the IP reconfig script. It also could mean some other adjustments to the network? Not sure what’s going on there.

I wouldn’t be at all surprised if this is why your network validation fails in the creation process.

Hi @JeremyJ You’re right. This is wrong.

First of all, this node was configure as 255.255.255.0 subnet. Then our office has change network address to mask 23.

So, I’ve run cluster reconfig and change zeus network address from 172.16.1.0/255.255.255.0 - 172.16.1.0/255.255.254.0

I’ll find maintenance room and change zeus external network address to 172.16.0.0/255.255.254.0 later.

Thank you.

@AntonioG As I told you this is non-production cluster. Just for PoC or internal testing in my office. So, I just used 2x1gb interfaces for connectivity.

Please have the following information ready :

Please have the following information ready :

`ncli cluster info`

`ncli host ls`

`ncli alert ls`

`cluster status | grep -v UP`

`nodetool -h 0 ring`

`ncli cluster get-domain-fault-tolerance-status type=node`

`ncc health_checks run_all`

Please have the following information ready :

Please have the following information ready :

ncli cluster info

ncli host ls

ncli alert ls

cluster status | grep -v UP

nodetool -h 0 ring

ncli cluster get-domain-fault-tolerance-status type=node

ncc health_checks run_all

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded

`ncli cluster info`

`ncli host ls`

`ncli alert ls`

`cluster status | grep -v UP`

`nodetool -h 0 ring`

`ncli cluster get-domain-fault-tolerance-status type=node`

`ncc health_checks run_all`