Solved

TLS Handshake failure on Check Login step

  • 10 January 2024
  • 19 replies
  • 179 views

Or, “How does the ‘Check Login’ step actually work?”

I consistently get a TLS error on any attempt to connect to the new VM from CALM. Every time. This is a blocker for me, as I can’t do much to the VMs during provisioning without this connectivity, and I can’t/shouldn’t do everything in the cloud-init step. 
 
I am able to ssh into the new VMs successfully using the same credentials, so I know that the credential is correct, but I don’t think the credential is the issue - the error is happening before the system tries to use it. The screenshots below are from two different blueprint deployments.
Script execution has failed with error: tls: first record does not look like a tls handshake.
Could not connect to the service running at 1.2.3.4.

From a higher level, how does CALM initiate the connection to the new VM when starting the “check login” step? The TLS handshake issue implies a mismatch of HTTP/S protocols, and/or mismatched TLS 1.1/1.2/etc versions. But this is supposed to be an SSH connection.

I increased the delay before starting the Check Login step to 5mins (300 seconds), with the same result.

Thanks in advance for any help!

icon

Best answer by JoseNutanix 10 January 2024, 21:48

View original

This topic has been closed for comments

19 replies

Userlevel 4
Badge +5

Hi mcascone,

We don’t use TLS with check log-in. Could you please confirm a couple of things?

  • Do you have configured a proxy in Prism Central?
  • What Linux distributions are you using?

 

Thanks for the reply!

  • yes, we have a proxy enabled, and i am configuring the new VM with that proxy in the `cloud-init` step. Although checking now, I see that they are not the same address. Could that be related?
  • I’m using CentOS7 in this case, just as something i’m familiar with to get something off the ground. We plan to use Ubuntu as well.
Userlevel 4
Badge +5

Yes, what is happening is that PC is initiating the SSH connection to the VM, but the VM subnet that is trying to connect is not excluded in the proxy configuration PC. Make sure in PC you are excluding/bypassing the subnets that you want to connect to.

Ok, i changed the new VM’s proxy to match the PC config, same result.

On the new VM, this is the config:

> sudo cat /etc/environment 
HTTP_PROXY=http://<matches PC proxy:port>
HTTPS_PROXY=http://<matches PC proxy:port>
NO_PROXY=localhost,127.0.0.1,10.96.0.0/12,192.168.59.0/24,192.168.49.0/24

does that look right?

One other thing, the `whitelist` config in PC only allows single IPs, I can’t enter a CIDR range like the NO_PROXY line above.

Userlevel 4
Badge +5

The whitelist is for the proxy configured in PC, and it supports subnets. The one not supporting subnets is the one on the operating system. Most of the time you don’t have to configure the exceptions on those ones unless is the VM operating system starting the connection which is not the case. Here is PC starting the connection to the VM.

To configure whitelist in the PC proxy, please follow the documentation. https://portal.nutanix.com/page/documents/details?targetId=Prism-Central-Admin-Center-Guide-vpc_2023_4:mul-http-proxy-configure-pc-t.html

Thanks for that link! And i do see that config panel, that is the one I was referring to as not allowing CIDR ranges. Because it only takes single, static IPs, i can’t think of a way to make it work for the VMs that are provisioned with DHCP? Meaning, i don’t know the IP of the new VMs, and they are ephemeral anyway. I don’t know the IP of the VM until it’s provisioned, and it’s not something that should be manually entered into a proxy every time anyway. How can this process be automated? 

I am sure I am not understanding something here! Thank you for your patience!

 

thanks!

Max

Userlevel 4
Badge +5

It takes CIDR, it’s on the documentation, isn’t it?

 

If i enter this: 10.139.225.0/24 and click save, it shows “Please enter a valid target” in red, and clears the field.

Userlevel 4
Badge +5

You have to pass 10.139.225.0/255.255.255.0

Userlevel 4
Badge +5

Btw, we’ll make sure the documentation gets updated to show an example with the expected format.

Thank you, the range is accepted now. But i still get the same error.

Userlevel 4
Badge +5

It’s likely you’ll have to restart the containers in Prism Central for them to take the changes. In PC you should run:

  • genesis stop nucalm epsilon
  • cluster start

Please be aware this will stop Calm and while the containers are down it won’t be accessible to anybody. 

That makes sense! I am in the process of getting the creds to the PC VM so it might take a bit to take that step.

@JoseNutanix , are these commands that we need to run in a shell on the actual VM that PC is running on, or do we run them from within the PC GUI?

Userlevel 4
Badge +5

Those commands must be run on the Prism Central virtual machine connecting via SSH.

running now!

It Worked! Thank you!

In addition, i actually added the wrong IP range at first, and then did the restart. I realized it was wrong and added the correct one and re-tested, without restarting, and it worked. So it seems that the restart is not needed, which is as I’d expect! 

This is great news! Thank you @JoseNutanix

Userlevel 4
Badge +5

Glad to hear @mcascone 🍻

Don’t hesitate to come back to the Nutanix community again if you have further questions.