PostgreSQL Tuning: Parameter

Home Posts filed under Parameter

Showing posts with label Parameter. Show all posts

Saturday, June 24, 2017

wal_sync_method and wal_buffers Parameter WAL settings

June 24, 2017

In this article We will discuss about wal_sync_method and wal_buffers for PostgreSQL Tuning Paramater. OK, let's take down...

One purpose of wal_sync_method is to tune such caching behavior.

The default behavior here is somewhat different from most of the options. When the server source code is compiled, a series of possible ways to write are considered. The one believed most efficient then becomes the compiled-in default. This value is not written to the postgresql.conf file at initdb time though, making it different from other auto-detected, platform-specific values such as shared_buffers.

Before adjusting anything, you should check what your platform detected as the fastest safe method using SHOW; the following is a Linux example:

postgres=# show wal_sync_method;
 wal_sync_method
-----------------
 fdatasync

On both Windows and the Mac OS X platforms, there is a special setting to make sure the OS clears any write-back caches. The safe value to use on these platforms that turns on this behavior is as follows:

wal_sync_method=fsync_writethrough

If you have this setting available to you, you really want to use it! It does exactly the right thing to make database writes safe, while not slowing down other applications the way disabling an entire hard drive write cache will do.

This setting will not work on all platforms however. Note that you will see a performance drop going from the default to this value, as is always the case when going from unsafe to reliable caching behavior.

On other platforms, tuning wal_sync_method can be much more complicated. It's theoretically possible to improve write throughput on any UNIX-like system by switching from any write method that uses a write/fsync or write/fdatasync pair to using a true synchronous write. On platforms that support safe DSYNC write behavior, you may already see this as your default when checking it with SHOW:

wal_sync_method=open_datasync

Even though you won't see it explicitly listed in the configuration file as such. If this is the case on your platform, there's little optimization beyond that you can likely perform. open_datasync is generally the optimal approach, and when available it can even use direct I/O as well to bypass the operating system cache.

The Linux situation is perhaps the most complicated. As shown in the last code, this platform will default to fdatasync as the method used. It is possible to switch this to use synchronous writes with:

wal_sync_method=open_sync

Also, in many cases you can discover this is faster—sometimes much faster—than the default behavior. However, whether this is safe or not depends on your filesystem. The default filesystem on most Linux systems, ext3, does not handle O_SYNC writes safely in many cases, which can result in corruption. See "PANIC caused by open_sync on Linux" at http://archives.postgresql.org/pgsqlhackers/2007-10/msg01310.php for an example of how dangerous this setting can be on that platform. There is evidence that this particular area has fi nally been cleaned up on recent (2.6.32) kernels when using the ext4 filesystem instead, but this has not been tested extensively at the database level yet.

In any case, your own tests of wal_sync_method should include the "pull the cord" test, where you power the server off unexpectedly, to make sure you don't lose any data with the method you've used. Testing at a very high load for a long period of time is also advisable, to find intermittent bugs that might cause a crash.

wal_buffers Parameter WAL settings

While the documentation on wal_buffers suggests that the default of 64 KB is sufficient as long as no single transaction exceeds that value, in practice write-heavy benchmarks see optimal performance at higher values than you might expect from that, at least 1 MB or more. With the only downside being the increased use of shared memory, and as there's no case where more than a single WAL segment could need to be buffered, given modern server memory sizes the normal thing to do nowadays is to just set:

wal_buffers=16MB

Then forget about it as a potential bottleneck or item to tune further. Only if you're tight on memory should you consider a smaller setting.

Thursday, June 22, 2017

Parameter Server Configuration

effective_cache_size and per-client settings Parameter

June 22, 2017

PostgreSQL is expected to have both its own dedicated memory (shared_buffers) as well as utilize the filesystem cache. In some cases, when making decisions like whether it is efficient to use an index or not, the database compares sizes it computes against the effective sum of all these caches; that's what it expects to find in effective_cache_size.

The same rough rule of thumb that would put shared_buffers at 25 percent of system memory would set effective_cache_size to between 50 and 75 percent of RAM. To get a more accurate estimate, first observe the size of the filesystem cache:

UNIX-like systems: Add the free and cached numbers shown by the free or top commands to estimate the filesystem cache size
Windows: Use the Windows Task Manager's Performance tab and look at the System Cache size

Assuming you have already started the database, you need to then add the shared_buffers figure to this value to arrive at a figure for effective_cache_size. If the database hasn't been started yet, usually the OS cache will be an accurate enough estimate, when it's not running. Once it is started, most of the database's dedicated memory will usually be allocated to its buffer cache anyway.

effective_cache_size does not allocate any memory. It's strictly used as input on how queries are executed, and a rough estimate is sufficient for most purposes. However, if you set this value much too high, actually executing the resulting queries may result in both the database and OS cache being disrupted by reading in the large number of blocks required to satisfy the query believed to fit easily in RAM.

It's rare you'll ever see this parameter tuned on a per-client basis, even though it is possible.

Per-client settings Parameter

While all of the settings in this section can be adjusted per client, you'll still want good starting settings for these parameters in the main configuration file. Individual clients that need values outside the standard can always do so using the SET command within their session.

Wednesday, June 21, 2017

Parameter Server Configuration

synchronous_commit Parameter

June 21, 2017

The overhead of waiting for physical disk commits was stressed as a likely bottleneck for committing transactions. If you don't have a battery-backed write cache to accelerate that, but you need better commit speed, what can you do? The standard approach is to disable synchronous_commit, which is sometimes alternately referred to as enabling asynchronous commits. This groups commits into chunks at a frequency determined by the related wal_writer_delay parameter. The default settings guarantee a real commit to disk at most 600 milliseconds after the client commit. During that window, which you can reduce in size with a corresponding decrease in speed-up, that data will not be recovered afterwards if your server crashes.

Note that it's possible to turn this parameter off for a single client during its session rather than making it a server-wide choice:

SET LOCAL synchronous_commit TO OFF;

This provides you with the option of having different physical commit guarantees for different types of data you put into the database. A routine activity monitoring table, one that was frequently inserted into and where a fraction of a second of loss is acceptable, would be a good candidate for asynchronous commit. An infrequently written table holding real-world monetary transactions should prefer the standard synchronous commit.

Monday, June 19, 2017

Parameter Server Configuration

random_page_cost and constraint_exclusion Parameter

June 19, 2017

random_page_cost Parameter

This parameter is common to tune, but explaining what it does requires a lot of background about how queries are planned. Particularly in earlier PostgreSQL versions, lowering this value from its default—for example, a reduction from 4.0 to 2.0—was a common technique. It was used for making it more likely that the planner would use indexed queries instead of the alternative of a sequential scan. With the smarter planner in current versions, this is certainly not where you want to start tuning at. You should prefer getting better statistics and setting the memory parameters as primary ways to influence the query planner.

constraint_exclusion Parameter

If you are using PostgreSQL 8.3 or earlier versions, and you are using the database's table inheritance feature to partition your data, you'll need to turn this parameter on.

Starting in 8.4, constraint_exclusion defaults to a new smarter setting named partition that will do the right thing in most situations without it ever needing to be adjusted.

Saturday, June 17, 2017

Parameter Server Configuration

Configuration parameter Tunables that to avoid

June 17, 2017

There are a few parameters in the postgesql.conf that have gathered up poor guidance in other guides you might come across, and they might already be set badly in a server whose configuration you're now responsible for. Others have names suggesting a use for the parameter that actually doesn't exist. This section warns you about the most common of those to avoid adjusting.

Some Parameter Tunable to avoid are:

fsync

full_page_writes

commit_delay and commit_siblings

max_prepared_transactions

Query enable parameters

Friday, June 16, 2017

Parameter Server Configuration

fsync and max_prepared_transactions Parameter

June 16, 2017

If you just want to ignore crash recovery altogether, you can do that by turning off the fsync parameter. This makes the value for wal_sync_method irrelevant, because the server won't be doing any WAL sync calls anymore.

It is important to recognize that if you have any sort of server crash when fsync is disabled, it is likely your database will be corrupted and no longer start afterwards. Despite this being a terrible situation to be running a database under, the performance speedup of turning crash recovery off is so large that you might come across suggestions you disable fsync anyway. You should be equally hesitant to trust any other advice you receive from sources suggesting this, as it is an unambiguously dangerous setting to disable.

One reason this idea gained traction is that in earlier PostgreSQL versions, there was no way to reduce the number of fsync calls to a lower number—to trade-off some amount of reliability for performance. Starting 8.3, in most cases where people used to disable fsync it's a better idea to turn off synchronous_commit instead.

There is one case where fsync=off may still make sense: initial bulk loading. If you're inserting a very large amount of data into the database, and do not have hardware with a battery-backed write cache, you might discover this takes far too long to ever be practical. In this case, turning the parameter off during the load— where all data can easily be recreated if there is a crash causing corruption—may be the only way to get loading time below your target. Once your server is back up again, you should turn it right back on again.

Some systems will also turn off fsync on servers with redundant copies of the database— for example, slaves used for reporting purposes. These can always resynchronize against the master if their data gets corrupted.

max_prepared_transactions Parameter

Many people see this name and assume that as they use prepared statements, a common technique to avoid SQL injection, that they need to increase this value. This is not the case; the two are not related. A prepared transaction is one that uses PREPARE TRANSACTION for two-phase commit (2PC).

If you're not specifically using that command and 2PC, you can leave this value at its default. If you are using those features, only then will you likely need to increase it to match the number of connections

Wednesday, June 14, 2017

Parameter Server Configuration

commit_delay and commit_siblings Parameter

June 14, 2017

Before synchronous_commit was implemented, there was an earlier attempt to add that sort of feature enabled by the commit_delay and commit_siblings parameters. These are not effective parameters to tune in most cases. It is extremely difficult to show any speedup by adjusting them, and quite easy to slow every transaction down by tweaking them.

The only case where they have shown some value is for extremely high I/O rate systems. Increasing the delay to a very small amount can make writes happen in bigger blocks, which sometimes turn out better aligned when combined with larger RAID stripe sizes in particular.

Tuesday, June 13, 2017

Parameter Server Configuration Tuning Query Tuning Script

Query enable parameters and Logging Parameter

June 13, 2017

It's possible to disable many of the query planner's techniques, in hopes of avoiding a known bad type of query. This is sometimes used as a work-around for the fact that PostgreSQL doesn't support direct optimizer hints for how to execute a query.

You might see the following code snippet, suggested as a way to force use of indexes instead of sequential scans for example:

enable_seqscan = off

Generally this is a bad idea, and you should improve the information the query optimizer is working with so it makes the right decisions instead.

Logging Parameter

General logging setup is important but it is somewhat outside the scope of this article. You may need to set parameters such as log_destination, log_directory, and log_filename to save your log files in a way compatible with the system administrations requirements of your environment. These will all be set to reasonable defaults to get started with on most systems.

On UNIX-like systems, it's common for some of the database logging to be set in the script that starts and stops the server, rather than directly in the postgresql.conf file. If you instead use the pg_ctlcommand to manually start the server, you may discover that logging ends up on your screen instead. You'll need to look at the script that starts the server normally (commonly /etc/init.d/postgresql) to determine what it does, if you want to duplicate that behavior. In most cases, you just need to add –l logfilename to the pg_ctl command line to redirect its output to the standard location.

Featured

Saturday, June 24, 2017

wal_buffers Parameter WAL settings

Thursday, June 22, 2017

Per-client settings Parameter

Wednesday, June 21, 2017

Monday, June 19, 2017

random_page_cost Parameter

constraint_exclusion Parameter

Saturday, June 17, 2017

Friday, June 16, 2017

Wednesday, June 14, 2017

Tuesday, June 13, 2017