Question

Windows Server domain join not possible when on different hosts

  • 17 March 2022
  • 11 replies
  • 538 views

Badge

Hello,

 

I stumbled upon a very weird problem and I am not able to sort it out myself.

I have a pretty fresh Cluster running AHV which is for our Lab.

I installed a Server 2022 DC - all fine. I installed a second one - all good.

I created the Active Directory on the first one, edited IP-Settings on both machines including DNS and everything and tried to join the other Server as a second Domain Controller. That is where I got stuck. It wasn’t able to join the Domain. On the Error Page it said it was able to find the Domain but wether A-Records are not correct or the Server is noch reachable.

Ping and nslookup works just fine though. I tried a day testing everything that was coming to my mind and after some while I tried to eliminate as much factors as possible. So I migrated both Servers to the same Node to eliminate networking issues - and this does the trick. When I want to domain join another Server from another Node it does not work anymore, as soon as it is on the same, it does.

I then tried to domain join from another virtual environment - not from nutanix - same outcome as when it’s on different nodes. When trying to join the only additional step for the traffic is to go out the node on the physical interface on native vlan, to a switch and back to the other node - so no firewall, no packet inspection no nothing. Just plain raw networking, not even routing.

 

So for those reasons and because it also doesn’t work coming from another source that nutanix I believe, that something happens with the packets entering the node. When I tried to do a wireshark the only differnce was a about 4 second delay and some retransmissions for the DNS when atrying to domain join:

oppsed to a pretty clean DNS query when on same nodes:

 

Do you have any ides what is causing this? As a next step, I will make the Server on the other non-nutanix environment a DC and try joining it from Nutanix to this to see if this problem is only outgoing or both, outgoing or incoming.

 

Any advice will be highly appreciated!


11 replies

Install the new .17 Virt-io drivers on 2022.  I had a similar issue and it resolved it.

 

-R

Badge

Hello DavidN,

I tried pretty much all of your suggestions but this is a great collection of troubleshooting options!

The solution I found yesterday is an issue I never came across before. I checked the traffic that went over the firewall thoroughly and saw errors that said "Bad Checksum" coming from the VMs. So I googled this and came across a similar issue with Boradcom NICs. After specifying for the Intel X722 that which are in the Lenovo Hosts I found this:

https://forums.lenovo.com/t5/ThinkSystem/SR630-amp-Intel-X722-issue-with-Windows-Server-2019/m-p/4435773?page=1#4436313

So apparently Windows introduced new options withing the NIC that lead to that errors. All the suggestions in this thread did not help though so I installed Server 2016 DCs and it works now. So I know the source of the issue now but not the final solution yet. But definitely it is a Windows Server 2019 and above problem - that's why the other virtual environment had the same issues - Intel NICs.

I don't know if there are new drivers out yet to fix the problem but as the problems are coming from the guest I don't even know if that would help. Apparently there are new drivers which have the said options disabled by default but that will oy help when windows is installed on bare metal with direct access to the NIC, not over a hypervisor I guess.

Hi,

We are having the same issue with a Lenovo HX2320 cluster (AHV) with Intel X722 NICs. Only Win Server 2022 DCs are impacted (Win Server 2019 DCs are OK). Has a fix been found ?

Romain

Hi,

We’ve actualy done quite a bit of digging with the help of Nutanix support and on our end the problem is that on VMs with the virtio 1.1.7 exe installed, installing the NGTs would revert the drivers to 1.1.6.18.

As a workaround, uninstalling either the virtio or NGT fixes the issue.

Pierre

Badge

On a side note: I’m also unable to join the Cluster in the Domain - maybe for the same reason because the CVm hosting Prism is on a different node than the DCs - but I did not verify yet.

Badge

Same result when trying to join from Nutanix to the other environment - same switch, same VLAN:

 

 

Badge

Update: I installed a second  server on the non-nutanix environment and tried to join it the test-domain on the non-nutanix side (we have both set up for testing purposes faily new) and we have the same picture here. No domain join on different servers but on the same it works. I will keep this thread postet if I’m allowed/supposed to and inform what the solution was in the end even though I assume it will be nothing nutanix specific but netwokring related.

Userlevel 3
Badge +5

Thoughts:

  • Double check that switches & VLANs are all configured (assuming you’ve got trunk ports to the hosts)
  • Try to isolate the hosts to use just one switch/one uplink ports (if you’re using multiple uplinks/switches) - perhaps the issue isn’t at the host/switches but further upstream at a router ↔️ switch level
     
  • Double check that that IPv6 protocol is enabled on both Servers (i.e. it’s not unchecked on the interface)
  • Can you run wireshark on the existing DC while they’re on two separate hosts? What does the DNS server “see” in terms of packets? Does it receive them?
    (might enable Full DNS logging on the DNS server to confirm things are ok...
    https://docs.microsoft.com/en-us/troubleshoot/windows-server/networking/dns-client-resolution-timeouts
  • Any way to Mirror the active uplink port of an AHV host on the switch to another port on the same switch and connect that to an external device that can capture ALL the traffic (promiscous mode) to see whether it’s even making it off the host (and see the replies?)

Perform various diagnostics as a test before the promotion?
dcdiag /test:dns /v /s:<DCName> /DnsBasic /f:dcdiagreport.txt

dcdiag /test:dns /DnsRecordRegistration

dcdiag /test:dns /v /s:<DCName> /DnsDynamicUpdate

 

Wonder what you get when you try this:

nltest /dsgetdc:<DNS domain name> /force

Userlevel 2
Badge +3

Hello PASC,

Can you share the name of Domain Name in use? Please also share your final queries

Rohan

Badge

Hello Rohan,

Can you tell me why the domain name is relevant for this issue?

And what do you mean by my final queries?

Userlevel 2
Badge +3

PASC,

I just wanted to check if we can help in any way to move forward

Rohan

Reply