Best answer by penguindows
Our issue is definitely latency. Our remote site is across the continental US, and our per VM size is 30TB (10TB used) on a single disk. as you can imagine, this creates a bottleneck for us. Replications with default settings are not meeting RPO.
Our AOS version is 5.5.2.
I have found (with help from support) these tunable settings within the cluster:
nutanix@NTNX:~$ python /home/nutanix/serviceability/bin/edit-aos-gflags | grep stargate_
2018-06-08 10:00:09 INFO zookeeper_session.py:110 edit-aos-gflags is attempting to connect to Zookeeper
stargate_cerebro_replication_max_rpc_vblocks = 16 #default 4
stargate_cerebro_replication_max_rpc_data = 4194304 #default 1048576
stargate_cerebro_max_outstanding_vdisk_replication_rpcs = 16 #default 4
stargate_cerebro_replication_param_multiplier = 32 #default 16
stargate_vdisk_read_extents_max_outstanding_egroup_reads = 6 #default 3
I am putting together a helpful howto on how to tune these parameters, but the gist of it is editing with...
nutanix@NTNX:~$ python /home/nutanix/serviceability/bin/edit-aos-gflags --service=stargate
...then restarting stargate on each cvm.
The restarts went perfectly with a sleep 60 in there, we had no failure on running replications, and our throughput went up ~4x as expected. our 12 day running replications began to finish.
my ultimate solution is two fold. first, the above settings to accelerate each stream. second, we are working to break out our 30TB disk in to 10x 3TB disks.
RE: going off stock; I think that as nutanix grows in to more environments (and they will, the technology is amazing and i foresee continued adoption) there are going to be many more flavors of environments that sticking to stock options just wont solve. I expect in the next few years that nutanix will pivot in to a more open position. I'd love to see some of these settings within reach of the prism GUI along with some better man, info pages on what each individual setting and switch does.
That being said, your warning about going off stock is noted and appreciated. i understand that nutanix attempts to tune AOS as well as possible out of the box, and that tuning these settings can have an impact (sometimes bad, sometimes good). Therefore, I will only seek a behavior change in the technology to achieve some desired result.