Question

Cluster creation seems fine but fails to start


Badge +1
Hello all,

I'm trying to create a new multi node cluster but fails at 33%. The three nodes indicate 100% with green check in front of them. When I look in the genesis.out, it states 'No valid Stargate IP addresses to forward to' (not shure if this is the main issue). When I check connectivity, all nodes can reach each others Hypervisor or other CVM's with a ping. I can't find a solution online. Further statistics, when I run a Genesis status, only this services are running:

Cassandra
Foundation
Genesis
Scavenger
Secure_file_sync
SSL_terminator
Zookeeper

When I run a cluster status, only this services are UP:
Zeus
Scavenger
SSLTerminator
SecureFileSync

Medusa reports an Error (Casandra gossip failed)

and at the end it says: INFO cluster:2597 Success!

A single node can be created and works fine, so looks like it has something to do with the three Hypervisors or CVM's communicating with each other. I'm running the latest community edition 20180501 on bare metal. DL380 G5 systems with 200GB SSD and 500GB SAS disks, which are found during installation.

Anyone a suggestion how to resolve this??

Regards,

Lennaert

This topic has been closed for comments

20 replies

Userlevel 6
Badge +16
Can you run:

code:
nodetool -h localhost ring
Badge +1
When I run this on my first node, I get 'Error connection to remote JMX agent!' followed by a lot of java messages. On my second node, I get a message ' Down Forwarding' (with the IP of my third node). On my third node, I get the same as on my first 'Error connection to remote JMX agent!'.

Conclusion:
Node 1: 'Error connection to remote JMX agent!'
Node 2: ' Down Forwarding'
Node 3: 'Error connection to remote JMX agent!'

Thanks for you quick response, does this ring any bells for you?

Kind Regards,

Lennaert
Userlevel 6
Badge +16
Do you have NTP set up?
Badge +1
Yes I did, to a public NTP server (Google 216.239.35.0). Tried without NTP but had the same issue.
Userlevel 6
Badge +16
Can you try:

code:
cluster restart_genesis
Badge +1
For your info, this is the complete message I get on the nodetool command



When I run the restart command I get the following results:


on all three nodes, followed by status INFO CLuster:2597 Success!
Userlevel 6
Badge +16
Can you ping google.com from all the CVMs?
Badge +1
Good test. The IP does not seem to be valid. Can the cluster start fail on just not having a time server present?
Userlevel 6
Badge +16
Yes if time is not set correctly cluster doesn't work. Time needs to be synced across all nodes.
Userlevel 6
Badge +16
Something like this might help you: https://domalab.com/configure-nutanix-time-server-using-command-line/
Badge +1
can I set the IP per CVM through the commandline?
Userlevel 6
Badge +16
I think this would cause too many problems. Actually I would suggest you to reinstall the cluster on static IPs. And remember Nutanix CE needs NTP server and internet connection to work properly.
Badge +1
That was my idea, so i did a fresh cluster create. Tested the connectivity to internet, which works. Can ping the time server from all cvm's, but the cluster does not come up :-(

The cluster creation website get stuck at 33% initializing cluster and after a refresh I don't get the page anymore (on all three node ip's).
Userlevel 5
Badge +9
Hi,
does cluster status still report the same services up/down as before?
Badge +1
Yes it does, like the screenshot. If i try to stop the cluster it fails. It seems like some sort of communication issue between de cvm's but they can ping each other and the hypervisors
Badge +1
Maybe something which could be part of the issue. I need to do some tricks since I have a P400 controller which uses cciss instead of hpsa, This works fine for the installer, it can recognize my disks and i set my ssd as a rotational disk within the rc.local boot file. When I look inside the CVM's the discs are all marked as rotational and i get permission denied when I try to change that. Maybe it's not part of the issue but just sharing.
Userlevel 5
Badge +9
Hi,
can you destroy the cluster with 'cluster destroy' or does it report any issues (e.g. cannot shutdown any services etc.)? If it works, and you again try to create the cluster again with 'cluster create', do you get any new warnings etc.?
Badge +1
I cannot destroy the cluster since it gives errors. What I do then is run the installer to get a fresh situation. And try to create a new cluster. Everytime it runs only till 33% initializing the cluster. A single node cluster works fine, so it is something between the nodes. i don't think the storage since a single node just works.
Badge +1
For your info I first do ./cleanup.sh
Userlevel 4
Badge +12
Hi Lennaert,

Did you manage to resolve this issue? I've run into this now and i've tried the ./cleanup.sh several times along with checking the NTP services and name servers. Getting extremely frustrated at how hard it can be to set up a NCE cluster.

Cheers,

Jon