This post was written by Derek Seaman, Customer Success Enterprise Architect, Nutanix
Early on in Nutanix’s history we focused on being the best and simplest HCI solution on the market. Market share, customer response and industry analyst feedback prove that we succeeded in that vision. However, Nutanix kept innovating and our vision has evolved into private cloud, hybrid cloud, and a multi-cloud strategy.
As the Nutanix product portfolio has expanded our customers and the field have asked for a concise reference architecture clearly showing how all the pieces fit together, guided by best practices. The result of this effort is the Nutanix Private Cloud Reference Architecture.
Reference Architecture Contents
One of the key sections in the reference architecture is the Design Objectives. All of the subsequent design decisions in the document are based on these Design Objectives. Some Design Objectives for each customer may be different than what we have defined, which in turn may alter the applicable design decision(s).
For example, our Design Objectives specified that all certificates be signed by a trusted certificate authority (CA). This is a best practice and required by many organizations, but I know many smaller companies that use self-signed certificates. The bottom line is that some design decisions that were made may not be applicable to a different set of requirements, so adjust decisions as needed to fit your organization’s requirements.
As you go through the reference architecture you will see 80+ design decisions. Each design decision has the following format. You will see a decision label that is a category and sequentially numbered (e.g. NET-001), Title, Justification and Implication. This follows the NPX methodology of documenting design decisions in a clear and understandable manner.
The reference architecture covers both Nutanix AHV and VMware vSphere hypervisors. Although Nutanix supports Hyper-V, we did not include it in this reference architecture. Nutanix AHV and VMware vSphere are the dominate hypervisors for our customer base, so our decision was to provide guidance for both to cover most of our install base.
Top 10 Unexpected Design Decisions
I thought it would be fun to cover 10 design decisions that might surprise some people. This will also give you a taste of types of decisions that are made throughout the reference architecture.
#10. Avoid switch stacking to ensure network availability during individual device failure
Some switch vendors offer ‘stacked’ switch configurations, where two or more switches are logically linked together. When they are linked together, operations such as firmware updates or maintenance operations can disrupt network connections to all stacked switches. This is bad for availability and is not a recommended configuration. Nutanix recommends a leaf-spine architecture where any single individual switch failure will not result in node downtime.
#9. Ensure that there are no more than three switches between any two Nutanix nodes in the same cluster
This is pretty clear cut, and in a properly designed leaf-spine architecture there should not be more than three switches between Nutanix nodes in the same cluster. More network devices, such as switches, routers and firewalls, can be between different Nutanix clusters.
#8. Reduce network over-subscription to achieve as close to a 1:1 ratio as possible
Years ago when the most common datacenter network speed was 1 Gbps, high over-subscription ratios were not uncommon. However, today modern leaf-spine architectures utilize 25 Gbps, 40 Gbps, or 100 Gbps links, and achieving a near 1:1 over-subscription ratio is now easy.
#7. Use active-backup uplink load balancing for AHV
Active-Backup is the default load balancing network configuration for AHV. For a very high percentage of customers, this configuration is more than adequate. For unique situations where more bandwidth is needed, refer to the AHV Networking best practices guide.
#6. Use standard 1,500 byte MTU and do not use jumbo frames
For nearly all workloads a 1,500 byte MTU is more than adequate. While jumbo frames might theoretically show a minor performance boost in some situations, it significantly complicates the network configuration. Reduced complexity results in reduced risk of human error and misconfigurations. Unless you have a highly unique situation that needs jumbo frames, do not use them.
#5. If running a non-numa-aware application on a VM, configure the VM’s memory and vCPU to fit within a Numa node
When a VM crosses a NUMA boundary and the application is not NUMA aware, performance can suffer. Some applications are not NUMA aware, such as Microsoft Exchange. Others, such as Microsoft SQL, are NUMA aware. Know your applications, and how to right size their VMs for optimal performance.
#4. Do not mix nodes that contain NVMe SSDs in the same cluster with hybrid SSD/HDD node
Pretty self-explanatory, but if you purchase Nutanix nodes with NVMe, they should only be in clusters that contain SSDs and NVMe. Do not mix them with hybrid clusters that have spinning HDDs. The performance gap between spinning HDDs and NVMe is so large that application performance may not be consistent when nodes are mixed in the same cluster.
#3. A minimum of three NTP servers for all infrastructure components should be provided
Time sync among the various infrastructure components is extremely important. Without proper time you can see authentication failures, impede forensics, or in severe cases see cluster downtime. Nutanix recommends an odd number of NTP servers be configured, minimum three, with the preference being five or more. Why? Added redundancy, and additional NTP servers allows the NTP client to detect a source going ‘rogue’, throwing out that source, and relying on the remaining sources. If you are at a ‘dark site’ without access to public NTP servers, Nutanix recommends using a GPS-based NTP appliance, or in the absence of one, a network switch. Do not use Active Directory as an NTP server for non-Windows devices.
#2. Do not use Nutanix cluster lockdown
Nutanix cluster lockdown is viable security tool which restricts CVM access via SSH keys only. The reason why we chose not to implement it in this reference architecture, goes back to the design objectives we laid out early in the guide. The majority of our customers are fine using username and passwords to access the CVM, so we didn’t implement stricter security objectives. However, in high security environments using cluster lockdown can be very appropriate and should be implemented. It all boils down to requirements.
#1. Enable CVM and hypervisor AIDE
What is AIDE? It’s Advanced Intrusion Detection Environment. This feature will perform checksums of binaries and libraries to detect any malicious changes. Once enabled, it runs on a fixed weekly schedule. This is a very lightweight process, yet adds another layer of security to the environment.
Summary
The Nutanix Private Cloud Reference Architecture will be delivered in a phased approach. Today, we are announcing and releasing phase one, the single datacenter. The three remaining phases will cover multi-datacenter, private cloud and hybrid cloud / multi-cloud.
This reference architecture is intended to demonstrate valuable methods and practices that organizations of all sizes can use to implement Nutanix solutions. There is no one-size-fits-all solution, and your organization’s individual requirements may change some of the decisions made in the document. Stay tuned for future releases of this guide that will cover disaster recovery, multi-sites, private cloud, and hybrid-cloud scenarios. You can download the Nutanix Private Cloud Reference Architecture Guide today!
2020 Nutanix, Inc. All rights reserved. Nutanix, the Nutanix logo and all Nutanix product, feature and service names mentioned herein are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. All other brand names mentioned herein are for identification purposes only and may be the trademarks of their respective holder(s). This post may contain links to external websites that are not part of Nutanix.com. Nutanix does not control these sites and disclaims all responsibility for the content or accuracy of any external site. Our decision to link to an external site should not be considered an endorsement of any content on such a site. Certain information contained in this post may relate to or be based on studies, publications, surveys and other data obtained from third-party sources and our own internal estimates and research. While we believe these third-party studies, publications, surveys and other data are reliable as of the date of this post, they have not independently verified, and we make no representation as to the adequacy, fairness, accuracy, or completeness of any information obtained from third-party sources.