My team and I was helping a customer troubleshoot a performance problem with their Oracle 11g R2 (18.104.22.168) database today and we noticed that it was taking between 90 seconds and 120 seconds to complete a checkpoint. Anyone who knows anything about Oracle knows that this is an eternity for a database to wait while switching online redo log files and continuing to process transactions. This was causing a lot of performance dissatisfaction for the customer, which is understandable. But it didn’t take long to figure out what the problem was and fix it. Literally changing a single parameter in the database and restarting the instance turbo charged performance by over 10X.
First lets cover off some of the symptoms of this problem and give you some background before I give you the parameter we used to address the performance issue. Also note that this had nothing to do with the underlying infrastructure, it was purely a database level setting. This problem wouldn’t happen in Oracle 12c.
The first sign of trouble was that data loads were taking a very long time and that storage latency appeared to be very high. Then there was an entry in the AWR reports saying that 76% of database wait time was caused by log file switch (checkpoint incomplete). The checkpoints themselves were taking over 90 seconds to complete. During which time the database effectively paused and didn’t process any transactions, as it was out of online redo log file space until the checkpoint was completed. During the data load and performance tests we noticed that there was only one or two outstanding IO’s going to the database disks.
So what caused this problem? Even though the right libraries were installed, the database was never configured to use async or direct IO. I suspected this based on the very low number of outstanding IO’s and low amount of IO concurrency. To confirm this suspicion we ran SHOW PARAMETER FILESYSTEMIO_OPTIONS from sqlplus. The output was as follows:
NAME TYPE VALUE------------------------------------ ----------- ------------------------------filesystemio_options string noneFortunately the answer to this is fairly straight forward. Use the command ALTER SYSTEM SET FILESYSTEMIO_OPTIONS=SETALL SCOPE=SPFILE; followed by shutting down and restarting the instance (you’d plan this for a maintenance period for production databases).
After making this change the IO latency significantly decreased, the IOPS increased, the IO concurrency increased and the performance was about 10X better than before. The customer was very happy and at the same time we got to show how excellent our support is. This was a team effort to address this problem and once the customer got in touch with us it didn't take us very long at all to fix it.
Final WordAlthough this simple change is required for Oracle 11g it is not required for 12c. Oracle Database 12c will use async or direct IO if it’s available in your operating system automatically.
This was originally posted to longwhiteclouds.com and is reproduced here with permission. The original article can be found at http://longwhiteclouds.com/2014/11/21/turbo-charge-oracle-database-io-performance/.