Answer
Remote site replication tuning
What options are available within nutanix to speed up replication to a remote site? The best i am able to get on a single stream is about 20MBps (160mbps) with an aggregate speed of about 60MBps (480mbps) spread out over several streams. However, we have a 10gbps link between sites, so i expected to get better throughput.
Best answer by penguindows
Yes, I see dlink7,s reply. Thanks @dlink7 for the good info on expected performance under ideal situations. That does give me a good baseline.
Our issue is definitely latency. Our remote site is across the continental US, and our per VM size is 30TB (10TB used) on a single disk. as you can imagine, this creates a bottleneck for us. Replications with default settings are not meeting RPO.
Our AOS version is 5.5.2.
I have found (with help from support) these tunable settings within the cluster:
I am putting together a helpful howto on how to tune these parameters, but the gist of it is editing with...
...then restarting stargate on each cvm.
The restarts went perfectly with a sleep 60 in there, we had no failure on running replications, and our throughput went up ~4x as expected. our 12 day running replications began to finish.
my ultimate solution is two fold. first, the above settings to accelerate each stream. second, we are working to break out our 30TB disk in to 10x 3TB disks.
RE: going off stock; I think that as nutanix grows in to more environments (and they will, the technology is amazing and i foresee continued adoption) there are going to be many more flavors of environments that sticking to stock options just wont solve. I expect in the next few years that nutanix will pivot in to a more open position. I'd love to see some of these settings within reach of the prism GUI along with some better man, info pages on what each individual setting and switch does.
That being said, your warning about going off stock is noted and appreciated. i understand that nutanix attempts to tune AOS as well as possible out of the box, and that tuning these settings can have an impact (sometimes bad, sometimes good). Therefore, I will only seek a behavior change in the technology to achieve some desired result.
Our issue is definitely latency. Our remote site is across the continental US, and our per VM size is 30TB (10TB used) on a single disk. as you can imagine, this creates a bottleneck for us. Replications with default settings are not meeting RPO.
Our AOS version is 5.5.2.
I have found (with help from support) these tunable settings within the cluster:
code:
nutanix@NTNX:~$ python /home/nutanix/serviceability/bin/edit-aos-gflags | grep stargate_
2018-06-08 10:00:09 INFO zookeeper_session.py:110 edit-aos-gflags is attempting to connect to Zookeeper
stargate_cerebro_replication_max_rpc_vblocks = 16 #default 4
stargate_cerebro_replication_max_rpc_data = 4194304 #default 1048576
stargate_cerebro_max_outstanding_vdisk_replication_rpcs = 16 #default 4
stargate_cerebro_replication_param_multiplier = 32 #default 16
stargate_vdisk_read_extents_max_outstanding_egroup_reads = 6 #default 3
I am putting together a helpful howto on how to tune these parameters, but the gist of it is editing with...
code:
nutanix@NTNX:~$ python /home/nutanix/serviceability/bin/edit-aos-gflags --service=stargate
...then restarting stargate on each cvm.
The restarts went perfectly with a sleep 60 in there, we had no failure on running replications, and our throughput went up ~4x as expected. our 12 day running replications began to finish.
my ultimate solution is two fold. first, the above settings to accelerate each stream. second, we are working to break out our 30TB disk in to 10x 3TB disks.
RE: going off stock; I think that as nutanix grows in to more environments (and they will, the technology is amazing and i foresee continued adoption) there are going to be many more flavors of environments that sticking to stock options just wont solve. I expect in the next few years that nutanix will pivot in to a more open position. I'd love to see some of these settings within reach of the prism GUI along with some better man, info pages on what each individual setting and switch does.
That being said, your warning about going off stock is noted and appreciated. i understand that nutanix attempts to tune AOS as well as possible out of the box, and that tuning these settings can have an impact (sometimes bad, sometimes good). Therefore, I will only seek a behavior change in the technology to achieve some desired result.
This topic has been closed for replies.
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.
