Question

Disk layout for Vmware SQL 2016 Guest Cluster

  • 4 March 2019
  • 4 replies
  • 2055 views

Hello all

I'm trying to establish the optimal disk layout for a VMware MS SQL 2016 2 node guest failover cluster that will have a single SQL instance hosting multiple small to medium sized databases.

Unfortunately we don't really have the funds for licensing to do the recommended approach of creating lot of small VMs to host these databases, so we need to consolidate where possible.

I've been reading though the Nutanix SQL 2016 best practice guide and it's raised a few questions.

I've decided to allocate multiple vdisks specifically for database files, so I'd have a few database files to each vdisk instead of one vdisk with all the database files on it.
I'm assuming this approach might be better due to each vdisk having a 6GB Oplog allocation and so more vdisks would prevent a single oplog from quickly over filling?

Is there any there real a benefit here to having more vdisks to spread the databases across or does this only really come into play when you split individual database files across disks?

I know spitting the log files into multiple files isn't worth while but is there any benefit to allocating more drives for the database log files too?

Thanks

This topic has been closed for comments

4 replies

Userlevel 3
Badge +10
Couple things here.
  1. How are you planning to cluster? Failover clustering is not supported on NFS per Microsoft. Unless I'm reading what you're stating wrong. (Single VM on a two node Nutanix cluster?)
  2. How many databases and how busy will they be?
My DBA is one of the best of the best Microsoft MVPs so we've gotten some great recommendations from him (Brad McGehee) as well as the Nutanix engineers.
He recommends 64k allocation formatting and separate disk for databases. For logs, and temp there's little benefit for seperation.
It's not so much about size as it is storage reads/writes that should determine if a database should require it's own VMDK. Most instances where it's a lot of reads but a smaller DB, you can cache most of it in RAM and it's fine. But the more DB's and the busier they are, the better off you are splitting it off. So you're right on the money with planning to do that.

AHV would be optimal for this since you can now take advantage of AHVTurbo on windows, regardless though, having multiple VMDKs for your databases means dedicated storage streams per VMDK. Don't be surprised though if you're seeing an increased amount of CPU as a result of more storage IO availability. Make sure that you have RSS enabled on your NIC so you're not overwhelming your CPUs, and the traffic balances across your CPUs evenly. If you put all your DB's on a single VMDK, they're contending for one another and you'll see increased read latencies. especially if your cluster is a hybrid where most of your data will live in HDD's if it's not overly busy or sequencially being read or written to.

For your more strenuous workloads, you can even look at pinning the VMDKs to SSD inside of your prism element. Take your SSD sizes into consideration though as you don't want it to contend with your OpLog storage.

As always you can engage support if you want to verify any of this and get assistance with a design.
Userlevel 3
Badge +5
Hi @AlPM

One of the main reason to split the data disk in multiple vdisks is really to avoid the OpLog not fill up. For log files, spreading in multiple vdisks will not offer benefits, since the logs are written sequentially and that type of workload will bypass the OpLog.

Here are some references:
Remember that, unlike database files, SQL Server log files are accessed or written to in a sequential or “fill and spill” manner. Therefore, spreading TempDB log files across several VMDKs is not advantageous and just makes the solution more complex than necessary. Because contention on log disks is fairly rare, Nutanix recommends a single log file per database unless you have a compelling reason for more.

Source: https://portal.nutanix.com/#/page/solutions/details?targetId=BP-2015-Microsoft-SQL-Server:top_sql_server_log_files.html

The OpLog is similar to a filesystem journal and is built as a staging area to handle bursts of random writes, coalesce them, and then sequentially drain the data to the extent store. Upon a write, the OpLog is synchronously replicated to another n number of CVM’s OpLog before the write is acknowledged for data availability purposes. All CVM OpLogs partake in the replication and are dynamically chosen based upon load. The OpLog is stored on the SSD tier on the CVM to provide extremely fast write I/O performance, especially for random I/O workloads. All SSD devices participate and handle a portion of OpLog storage. For sequential workloads, the OpLog is bypassed and the writes go directly to the extent store. If data is currently sitting in the OpLog and has not been drained, all read requests will be directly fulfilled from the OpLog until they have been drained, where they would then be served by the extent store/unified cache. For containers where fingerprinting (aka Dedupe) has been enabled, all write I/Os will be fingerprinted using a hashing scheme allowing them to be deduplicated based upon fingerprint in the unified cache.

Source: https://nutanixbible.com/

Couple things here.

  1. How are you planning to cluster? Failover clustering is not supported on NFS per Microsoft. Unless I'm reading what you're stating wrong. (Single VM on a two node Nutanix cluster?)
  2. How many databases and how busy will they be?

My DBA is one of the best of the best Microsoft MVPs so we've gotten some great recommendations from him (Brad McGehee) as well as the Nutanix engineers.
He recommends 64k allocation formatting and separate disk for databases. For logs, and temp there's little benefit for seperation.
It's not so much about size as it is storage reads/writes that should determine if a database should require it's own VMDK. Most instances where it's a lot of reads but a smaller DB, you can cache most of it in RAM and it's fine. But the more DB's and the busier they are, the better off you are splitting it off. So you're right on the money with planning to do that.

AHV would be optimal for this since you can now take advantage of AHVTurbo on windows, regardless though, having multiple VMDKs for your databases means dedicated storage streams per VMDK. Don't be surprised though if you're seeing an increased amount of CPU as a result of more storage IO availability. Make sure that you have RSS enabled on your NIC so you're not overwhelming your CPUs, and the traffic balances across your CPUs evenly. If you put all your DB's on a single VMDK, they're contending for one another and you'll see increased read latencies. especially if your cluster is a hybrid where most of your data will live in HDD's if it's not overly busy or sequencially being read or written to.

For your more strenuous workloads, you can even look at pinning the VMDKs to SSD inside of your prism element. Take your SSD sizes into consideration though as you don't want it to contend with your OpLog storage.

As always you can engage support if you want to verify any of this and get assistance with a design.


Hiya and thanks!

We're basically using in guest ISCSI (connect from the VMs to the Nutanix cluster ISCSI data services IP) and use Nutanix volume groups for the shared disks - Which is supported as far as I know and the only way to do it. We're running ESXi.

There's around 15 databases at the moment, all relatively busy I'd say. We're be moving a lot of them from various existing physical servers that have local disks only.
Unfortunately we won't be going AHV, not yet anyway - We've only just installed this new environment with ESXI 😕.

I've covered of the usual basics I think:
Format drives at 64K
Lock pages in memory
Presize DBs where possible and set Auto growth settings appropriately
etc etc etc

I'd be interested to know if there are any ISCSI queue depth limits to be concerned about as I know that the preferred option when using normal VM SCSI controllers is to use Para Virtual controllers due to the performance benifits. Are there any tweaks to be made here that you know about?

Cheers

 

 

Hi @AlPM

One of the main reason to split the data disk in multiple vdisks is really to avoid the OpLog not fill up. For log files, spreading in multiple vdisks will not offer benefits, since the logs are written sequentially and that type of workload will bypass the OpLog.

Here are some references:

Remember that, unlike database files, SQL Server log files are accessed or written to in a sequential or “fill and spill” manner. Therefore, spreading TempDB log files across several VMDKs is not advantageous and just makes the solution more complex than necessary. Because contention on log disks is fairly rare, Nutanix recommends a single log file per database unless you have a compelling reason for more.
Source: https://portal.nutanix.com/#/page/solutions/details?targetId=BP-2015-Microsoft-SQL-Server:top_sql_server_log_files.html


The OpLog is similar to a filesystem journal and is built as a staging area to handle bursts of random writes, coalesce them, and then sequentially drain the data to the extent store. Upon a write, the OpLog is synchronously replicated to another n number of CVM’s OpLog before the write is acknowledged for data availability purposes. All CVM OpLogs partake in the replication and are dynamically chosen based upon load. The OpLog is stored on the SSD tier on the CVM to provide extremely fast write I/O performance, especially for random I/O workloads. All SSD devices participate and handle a portion of OpLog storage. For sequential workloads, the OpLog is bypassed and the writes go directly to the extent store. If data is currently sitting in the OpLog and has not been drained, all read requests will be directly fulfilled from the OpLog until they have been drained, where they would then be served by the extent store/unified cache. For containers where fingerprinting (aka Dedupe) has been enabled, all write I/Os will be fingerprinted using a hashing scheme allowing them to be deduplicated based upon fingerprint in the unified cache.
Source: https://nutanixbible.com/


Thanks a lot.

It seems I'm on the right track.

Other than the OPLOG filling up are there any other benefits to providing more drives for the database files to go on?
I doubt we will spilt any database files across disks as most of the databases are pretty small. I'm just looking to spread the databases evenly across the disks, so perhaps 2 or 3 DBs on each vdisk.

Nutanix recommends a single log file per database unless you have a compelling reason for more
I understand splitting log files across disks has no performance benefit so we won't do that, but we will have a log file for each database so all the logs files will be hosted on one disk only.

Cheers