PostgreSQL Performance


Badge +3
Hi

Does anyone having experience or recommendation for tuning Postgresql Server Workload on Nutanix?

We followed the Nutanix SQL Server Best Practices and Micheal Webster's Recommandations for vDisk and LVM layout at: http://longwhiteclouds.com/2014/11/27/performance-testing-mysql-and-postgresql-with-hammerdb-and-pgbench/

Unfortunately, the Performance does not meet our expactation compared to our old PostgreSQL server running on a NetApp Filer.

Out Setup:
3 Nodes Nutanix Cluster NX-6035-G4 with 1200 GB SSD,
NOS 4.6.11
vSphere 6

12 replies

Userlevel 7
Badge +30
First thing, for any performance issues, we always ask to make a support ticket, and we can dig into the performance issues with you.

It looks like you do have a case open as of today, which is good, and some recommendations have been provided.

Additionally, if those recommendations still dont get you to the levels you need, we can run some performance tracing scripts and see exactly where the bottleneck is.
Badge +3
Hi Jon

Thanks for your reply. Yes, I opened a case regarding this matter already. I was just wondering if someone else has any experience or suggestion.
Userlevel 7
Badge +30
Of course, no worries there at all. That's what these forums are for, so I'll defer to others in the community if/when they want to post up here.

Also, paging 
Badge +3
Just want to share our experience so far with the community:

Nutanix's Suggestions indicated as * and our Setup as >:

* DB, Logs and wal on different vDisks>Currently, we just separate the OS+Postgresql Bineries and DB into separate vDisks.OS+PgSQL on a Thin Provisioned vDisk on a Deduped ContainerPostgreSQL DB on 4vDisks on a line-compressed Container, the 4 vDisks are aggregated into a single LVM Volume and consumed by PostgreSQL.* RAM 2-3X DB Size>We are a small company, we just cannot affort to allocate that much of Memory dedicated just a Database Server and our workload is not that high. Totally we have 1.13 TB of RAM in our Nutanix Cluster, the current Database Size is 80 GB. We allocated 24 GB RAM to this PostgreSQL Server.* The wal_buffers can be bumped up to 16MB>Our wal_buffer is already set to 16MB* If possible, disable synchronous_commit - Disabling this will disable the guarantee that the data was committed to disks, but will not corrupt the db.>How could we ensure the data integrity if we disable this settings?According to my knowledge, Nutanix does not use a non persistent write cache, when a I/O is issued in a Nutanix environment, if it is Random, it will be sent to the “OpLog” which is a persistent write buffer stored on SSD. Therefore, there is no need for battery backed caching HBAs due to Nutanix write acknowledgement not being given until written to persistent media on two or more nodes.

For Testing Purposes, we changed the default ext4 mount options to data=writeback,noatime,barrier=0* Set shared_buffers to 25-40% of system memory. Usually require a corresponding increase in checkpoint_segments, in order to spread out the process of writing large quantities of new or changed data over a longer period of time> currently set to 6 GB* If applicable increase work_mem - If you do a lot of hash joins, hash-based aggregations, and hash-based processing of IN subqueries you will want to allocate more work_mem (default = 1MB)>currently set to 15728kb


>We tuned other PostgreSQl Settings as follow according to http://pgtune.leopard.in.ua/
max_connections = 200shared_buffers = 6GBeffective_cache_size = 18GBwork_mem = 15728kBmaintenance_work_mem = 2GBcheckpoint_segments = 128checkpoint_completion_target = 0.9wal_buffers = 16MBdefault_statistics_target = 500

We did some benchmark testing with PGBench with a scale factor of 1500 and conduct the following Tests:

  • Overall read operation benchmark with Profile TPC-B (pgbench -n -s 1500 -T 300 -c 50 -j 5 pgbench)
  • Overall write operation benchmark by using PGBench Database initialization wich includes, copy, update, vacuum, create index etc. (pgbench -i -q -s 1500 pgbench)
With the above Settings especially disabling synchronous_commit, we do gain some performance.

However, the results of the benchmark shows better performance on other Systems so far.
Userlevel 7
Badge +30
ok, thanks for the information. Make sure you provide all of that information to the support team, and let's work together to quanitify the baseline, and the changes/improvements that these make.

On top of that, we can run collect_perf tracing scripts while you are doing your benchmark(s), so we can see whats going on under the hood.

Also, while we're at it, can you post here the "other" configuration you're testing, that's benchmarking better? That would good to help compare at a high level.
Badge +3
The Other Setup we compared:

Test System 1 Desktop Computer:
1*Intel Core i5-4570-T CPU @ 3.20GHz, 12GB RAM, Intel SSD 256 GB.

Test System 2 current production virtualized PostgreSQL Server running on ESX Host with NetApp NFS Storage:
ESX Host IBM X3650 M4 with 2*Intel Xeon E5-2690v2 @ 3.00 GHz, 2*10GiE
NetApp FAS2240-2 with 22*550GB SATA Disks + 4*200GB SSD Disks as Flash Pool, 2*10GiE
PostgrESQL Server: 4vCPU, 16 GB RAM, PostgreSQL DB mounted directly from the NetApp Filer via NFS with mount options rw,hard,ac,nointr,rsize=32768,wsize=32768,bg,nfsvers=3,tcp,sec=sys.
Badge +5
Hi JC,
we are using dedicated N/VMware (8k series) cluster for hosting PG's (~80 VM's/PG instances)
No compression/dedupe on datastore

Probably our requirements for perf are not similar as yours (nevertheless some db's are doing ~1k tps /1M tuples - cache is in help)

Our setup shortly is following
machine conf itself is "standard" Centos6/7 machines 1-4vCPU and 4-32GB RAM per VM and pg conf is calculated based on vm resources
PG data is on separate controller (paravirt) all with LVM and XFS for easier manipulation
-1:1 data disk (if we hit disk queue hard, we move to separate controller and 2+ disks)
-1:2 wal disk
-1:3 xlog disk
and
-stats on tmpfs

increased disk read ahead, swappiness 0 and some more (tuned profile with some tweaks)
And some cases using pgbouncer on connection heavy application nodes.

So far so good 🙂
Badge +7
Hi,

you did not specify which processor are you using on the Nutanix 6035-G4. Usually all the modernn DBMS perform a lot of Async I/O trying to feed data to the CPUs as fast as possible. Hence the CPU speed of each core become a relevant factor. The E5-2690 v2 is a very fast CPU clocked at 3.00 Ghz and likely the 6035-G4 has a v3 CPU with a similar number of cores but with a slower clock. This "could" impact the overall benchmark results. You should have a look at the total CPU utilization during the benchmark to have a hint about it.

Regards,

Stefano
Badge +3
Hi Kirkkak
thanks for sharing your experience.
We are a small company therefore, we are using our 6k Series Cluster for mix work load of server virtualizations. Our PostgreSQL is service Data for a Data Warehouse like Web Application.

Together with Nutanix's Support Team we reached relative good performance outcome. I would like to share our setup here:

our 3 Nodes 6k Cluster Setup, each Node has:
2 x Intel Xeon Processor 2.40GHz 8-core Haswell E5-2630 v3 20M Cache
12 x 32GB DDR4 Memory Module
5x 4TB 3.5" HDD
1 x 1.2TB 3.5" SSD
1 x 10GbE Dual SFP+ Network Adapter

Virtual Machine:
Ubuntu 16.04 64 Bit machine
8vCPU, 24GB RAM (allthought Nutnaix recommended 2-3xDB Size, our DB Size is 80GB, which leads to a optimal size of 160+ GB RAM. Since this cluster is not just dedicated for PG Workload we can not affort this huge amount of RAM)
OS on separate controller (paravirt)
OS Disk 0:0
PG data is on separate controller (paravirt) with multiple Disks
1:0-3 data disk
-2:0-3 wal disk

Linux Tuning
  • vDisks are aggregated with LVM as one logical Disk
  • IO Scheduler > noop
  • ext4 as File System. According to multiple Benchmarks concluded that XFS shows similar performance to ext4. We decided to stick with ext4 due to production experience which we already have.
  • Optimal ext4 mount options to (nodiratime,noatime,data=writeback,barrier=0,discard)
Kernel Tuning
  • disabled Kernel Transparent huge Page
  • transparent_defrag=never
  • kernel.sched_autogroup_enabled=0
  • nr_hugepages=3170 (default value recommended by PostgreSQL Documentation, higher value however does not show performance increase)
PostgreSQL 9.4 Tuning
  • pg conf is calculated based on vm resources
  • fsync = off
  • synchronous_commit = off
  • full_page_writes = off
  • According to Nutniax's Performance Team, we can disabled all data consistency relevant Settings due to the fact that Nutanix will handle the Data integrity. Data in memory will be flushed as per your dirty page ratio. This flushing is done to oplog and from this point you will have two copies of the data.
Following Settings does not show any performance increase:
  • vm.swappiness=10
  • vm.overcommit_memory=2
  • vm.overcommit_ratio=100
  • vm.dirty_ratio=2
  • vm.dirty_background_ratio=1
  • vm.nr_hugepages = echo $(grep ^VmPeak /proc/$(cat /var/run/postgresql/9.4-main.pid)/status | sed -e 's/VmPeak://' | sed -e 's/kB//' | sed -e 's/[ ]//g')/$(grep Hugepagesize /proc/meminfo | sed -e 's/Hugepagesize://' | sed -e 's/kB//' | sed -e 's/[ ]//g') | bc -l | awk '{print int($1+1)}'


Result with above Setup:
pgbench -n -s 1500 -T 300 -c 50 -j 5 pgbench
transaction type: TPC-B (sort of)
scaling factor: 1500
query mode: simple
number of clients: 50
number of threads: 5
duration: 300 s
number of transactions actually processed: 2110817
latency average: 7.106 ms
tps = 7035.719304 (including connections establishing)
tps = 7036.387916 (excluding connections establishing)

Result with 2xDB Size of RAM, 160 GB RAM:
pgbench -n -s 1500 -T 300 -c 50 -j 5 pgbench
transaction type: TPC-B (sort of)
scaling factor: 1500
query mode: simple
number of clients: 50
number of threads: 5
duration: 300 s
number of transactions actually processed: 3104781
latency average: 4.831 ms
tps = 10348.628006 (including connections establishing)
tps = 10350.399677 (excluding connections establishing)
Badge +3
Hi Stefano

The slower Clock of the Intel Xeon Processor 2.40GHz 8-core Haswell E5-2630 v3 does have a overall impact to our benchmark and is not the optimal choice to run PG Workload. However, our cluster is hosting a mixed-workload environment, we favoured the numbers of core rather than the clock.

Cheers
Jacky
Badge +5
Good to hear that you get it worked out and thank you for thorough description of working conf 🙂

Just one point, move stats from disk to tmpfs - you'll lower your disk io requirements significantly

bregs
Kirkka
Badge +7
Jacky ,

giving the size of your DB (80 GB) the I/O in your benchmark is all on the SSD layer. The I/O latencies reported from the DB side are higher than what is expected from SSD with Nutanix. This could be due to Async I/O where many I/Os are issued and then the DB wait for the completion. OR this could be due to a saturation of a component in the I/O chain. It could be useful to have a look at the IO stats from Prism during the benchmark and to the CPU utilization of the CVM .

You did not specify what TPS rate did you reach with your current traditional system are they higher or lower than Nutanix now?

Reply