PostgreSQL Tuning: Server Configuration

Home Posts filed under Server Configuration

Showing posts with label Server Configuration. Show all posts

Saturday, July 8, 2017

Server Configuration

log_statement and og_min_duration_statement Parameter

July 08, 2017

The log_statement Parameter POSTGRESQL options for this setting are as follows:

none: Do not log any statement-level information.
ddl: Log only Data Definition Language (DDL) statements such as CREATE and DROP. This can normally be left on even in production, and is handy to catch major changes introduced accidentally or intentionally by administrators.
mod: Log any statement that modifies a value, which is essentially everything except for simple SELECT statements. If your workload is mostly SELECT based with relatively few data changes, this may be practical to leave enabled all the time.
all: Log every statement. This is generally impractical to leave on in production due to the overhead of the logging. However, if your server is powerful enough relative to its workload, it may be practical to keep it on all the time.

Statement logging is a powerful technique for finding performance issues. Analyzing the information saved by log_statement and related sources for statement-level detail can reveal the true source for many types of performance issues. You will need to combine this with appropriate analysis tools.

log_min_duration_statement Parameter

Once you have some idea of how long a typical query statement postgreSQL Tuning should take to execute, this setting allows you to log only the ones that exceed some threshold you set. The value is in milliseconds, so you might set:

log_min_duration_statement=1000

And then you'll only see statements that take longer than one second to run. This can be extremely handy for finding out the source of "outlier" statements that take much longer than most to execute.

If you are running 8.4 or later, you might instead prefer to use the auto_explainmodule: http://www.postgresql.org/docs/8.4/static/auto-explain.html instead of this feature. This will allow you to actually see why the queries that are running slowly are doing so by viewing their associated EXPLAIN plans.

Monday, July 3, 2017

Server Configuration

Autovacuum and Vacuuming and statistics Parameter

July 03, 2017

As both these tasks are critical to database performance over the long-term, starting in PostgreSQL 8.1 there is an autovacuum daemon available that will run in the background to handle these tasks for you. Its action is triggered by the number of changes to the database exceeding a threshold it calculates based on the existing table size.

The parameter for autovacuum is turned on by default in PostgreSQL 8.3, and the default settings are generally aggressive enough to work out of the box for smaller database with little manual tuning. Generally you just need to be careful that the amount of data in the free space map doesn't exceed max_fsm_pages, and even that requirement is automated away from being a concern as of 8.4.

Vacuuming and statistics Parameter

PostgreSQL databases require two primary forms of regular maintenance as data is added, updated, and deleted.

VACUUM cleans up after old transactions, including removing information that is no longer visible and returning freed space to where it can be re-used. The more often you UPDATE and DELETE information from the database, the more likely you'll need a regular vacuum cleaning regime. However, even static tables with data that never changes once inserted still need occasional care here.

ANALYZE looks at tables in the database and collects statistics about them— information like estimates of how many rows they have and how many distinct values are in there. Many aspects of query planning depend on this statistics data being accurate.

Enabling autovacuum on older versions Parameter

If you have autovacuum available but it's not turned on by default, which will be the case with PostgreSQL 8.1 and 8.2, there are a few related parameters that must also be enabled for it to work, as covered in http://www.postgresql.org/docs/8.1/interactive/maintenance.html or http://www.postgresql.org/docs/8.2/interactive/routine-vacuuming.html.

The normal trio to enable in the postgresql.conf file in these versions are:

stats_start_collector=true
stats_row_level=true
autovacuum=on

Note that as warned in the documentation, it's also wise to consider adjusting superuser_reserved_connections to allow for the autovacuum processes in these earlier versions.

The autovacuum you'll get in 8.1 and 8.2 is not going to be as efficient as what comes in 8.3 and later. You can expect it to take some fine tuning to get the right balance of enough maintenance without too much overhead, and because there's only a single worker it's easier for it to fall behind on a busy server. This topic isn't covered at length here. It's generally a better idea to put time into planning an upgrade to a PostgreSQL version with a newer autovacuum than to try and tweak an old one extensively, particularly if there are so many other performance issues that cannot be resolved easily in the older versions, too.

Thursday, June 29, 2017

Server Configuration

work_mem and maintainance_work_mem Parameter

June 29, 2017

work_mem Parameter

When a query is running that needs to sort data, the database estimates how much data is involved and then compares it to the work_mem parameter. If it's larger (and the default is only 1 MB), rather than sorting in memory it will write all the data out and use a disk-based sort instead. This is much, much slower than a memory based one. Accordingly, if you regularly sort data, and have memory to spare, a large increase in work_mem can be one of the most effective ways to speed up your server. A data warehousing report might on a giant server run with a gigabyte of work_mem for its larger reports.

The catch is that you can't necessarily predict the number of sorts any one client will be doing, and work_mem is a per-sort parameter rather than a per-client one. This means that memory use via work_mem is theoretically unbounded, where a number of clients sorting large enough things to happen concurrently.

In practice, there aren't that many sorts going on in a typical query, usually only one or two. And not every client that's active will be sorting at the same time. The normal guidance for work_mem is to consider how much free RAM is around after shared_buffers is allocated (the same OS caching size figure needed to compute effective_cache_size), divide by max_connections, and then take a fraction of that figure; a half of that would be an aggressive work_mem value. In that case, only if every client had two sorts active all at the same time would the server be likely to run out of memory, which is an unlikely scenario.

The work_mem computation is increasingly used in later PostgreSQL versions for estimating whether hash structures can be built in memory. Its use as a client, memory size threshold is not limited just to sorts. That's simply the easiest way to talk about the type of memory allocation decision it helps to guide.

Like synchronous_commit, work_mem can also be set per-client. This allows an approach where you keep the default to a moderate value, and only increase sort memory for the clients that you know are running large reports

maintainance_work_mem Parameter

A few operations in the database server need working memory for larger operations than just regular sorting. VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY all can allocate up to maintainance_work_mem worth of memory instead. As it's unlikely that many sessions will be doing one of these operations at once, it's possible to set this value much higher than the standard per-client work_mem setting. Note that at least autovacuum_max_workers (defaulting to 3 starting in version 8.3) will allocate this much memory, so consider those sessions (perhaps along with a session or two doing a CREATE INDEX) when setting this value.

Assuming you haven't increased the number of autovacuum workers, a typical high setting for this value on a modern server would be at five percent of the total RAM, so that even five such processes wouldn't exceed a quarter of available memory. This works out to approximately 50 MB of maintainance_work_mem per GB of server RAM.

Wednesday, June 28, 2017

Server Configuration

default_statistics_target Parameter

June 28, 2017

PostgreSQL makes its decisions about how queries execute based on statistics collected about each table in your database. This information is collected by analyzing the tables, either with the ANALYZE statement or via autovacuum doing that step. In either case, the amount of information collected during the analyze step is set by default_statistics_target. Increasing this value makes analysis take longer, and as analysis of autovacuum happens regularly this turns into increased background overhead for database maintenance. But if there aren't enough statistics about a table, you can get bad plans for queries against it.

The default value for this setting used to be the very low (that is,10), but was increased to 100 in PostgreSQL 8.4. Using that larger value was popular in earlier versions, too, for general improved query behavior. Indexes using the LIKE operator tended to work much better with values greater than 100 rather than below it, due to a hard-coded change at that threshold.

Note that increasing this value does result in a net slowdown on your system if you're not ever running queries where the additional statistics result in a change to a better query plan. This is one reason why some simple benchmarks show PostgreSQL 8.4 as slightly slower than 8.3 at default parameters for each, and in some cases you might return an 8.4 install to a smaller setting. Extremely large settings for default_statistics_target are discouraged due to the large overhead they incur.

If there is just a particular column in a table you know that needs better statistics, you can use ALTER TABLE SET STATISTICS on that column to adjust this setting just for it. This works better than increasing the system-wide default and making every table pay for that requirement. Typically, the columns that really require a lot more statistics to work properly will require a setting near the maximum of 1000 (increased to 10,000 in later versions) to get a serious behavior change, which is far higher than you'd want to collect data for on every table in the database

Tuesday, June 27, 2017

Server Configuration

CHECKPOINT Paramater

June 27, 2017

There are more checkpoint parameter for PostgreSQL tuning paramater, such as:

checkpoint_segments Parameter Checkpoints

Each WAL segment takes up 16 MB. As described at http://www.postgresql.org/docs/current/interactive/wal-configuration.html the maximum number of segments you can expect to be in use at any time is:

(2 + checkpoint_completion_target) * checkpoint_segments + 1

Note that in PostgreSQL versions before 8.3 that do not have spread checkpoints, you can still use this formula, just substitute the following code snippet for the value you'll be missing:

checkpoint_completion_target=0

The easiest way to think about the result is in terms of the total size of all the WAL segments that you can expect to see on disk, which has both a disk cost and serves as something that can be used to estimate the time for recovery after a database crash. The expected peak pg_xlog size grows as shown in the following table:

checkpoint_segments	checkpoint_completion_target=0	target=0.5	target=0.9
3	112MB	144MB	160MB
10	336MB	416MB	480MB
32	1040MB	1296MB	1504MB
64	2064MB	2576MB	2992MB
128	4112MB	5136MB	5968MB
256	8208MB	10256MB	11904MB

The general rule of thumb you can extract here is that for every 32 checkpoint segments, expect at least 1 GB of WAL files to accumulate. As database crash recovery can take quite a while to process even that much data, 32 is as high as you want to make this setting for anything but a serious database server. The default of 3 is very low for most systems though; even a small install should consider an increase to at least 10.

Normally, you'll only want a value greater than 32 on a smaller server when doing bulk-loading, where it can help performance significantly and crash recovery isn't important. Databases that routinely do bulk loads may need a higher setting.

checkpoint_timeout Parameter Checkpoint

checkpoint_timeout is value of Parameter Checkpoint. The default for this setting of 5 minutes is fine for most installations. If your system isn't able to keep up with writes and you've already increased checkpoint_segments to where the timeout is the main thing driving when checkpoints happen, it's reasonable to consider an increase to this value.

Aiming for 10 minutes or more between checkpoints isn't dangerous; again it just increases how long database recovery after a crash will take. As this is one component to database server downtime after a crash, that's something you need a healthy respect for.

checkpoint_completion_target is value of Parameter Checkpoint.

If you have increased checkpoint_segments to at least 10, it's reasonable at that point to also increase checkpoint_competion_target to its practical maximum of 0.9. This gives maximum checkpoint spreading, which theoretically means the smoothest I/O, too. In some cases keeping the default of 0.5 will still be better however, as it makes it less likely that one checkpoint's writes will spill into the next one.

It's unlikely that a value below 0.5 will be very effective at spreading checkpoints at all. Moreover, unless you have an extremely large value for the number of segments the practical difference between small changes in its value are unlikely to matter. One approach for the really thorough is to try both 0.5 and 0.9 with your application and see which one gives the smoother disk I/O curve over time, as judged by OS-level monitoring.

Saturday, June 24, 2017

Parameter Server Configuration

wal_sync_method and wal_buffers Parameter WAL settings

June 24, 2017

In this article We will discuss about wal_sync_method and wal_buffers for PostgreSQL Tuning Paramater. OK, let's take down...

One purpose of wal_sync_method is to tune such caching behavior.

The default behavior here is somewhat different from most of the options. When the server source code is compiled, a series of possible ways to write are considered. The one believed most efficient then becomes the compiled-in default. This value is not written to the postgresql.conf file at initdb time though, making it different from other auto-detected, platform-specific values such as shared_buffers.

Before adjusting anything, you should check what your platform detected as the fastest safe method using SHOW; the following is a Linux example:

postgres=# show wal_sync_method;
 wal_sync_method
-----------------
 fdatasync

On both Windows and the Mac OS X platforms, there is a special setting to make sure the OS clears any write-back caches. The safe value to use on these platforms that turns on this behavior is as follows:

wal_sync_method=fsync_writethrough

If you have this setting available to you, you really want to use it! It does exactly the right thing to make database writes safe, while not slowing down other applications the way disabling an entire hard drive write cache will do.

This setting will not work on all platforms however. Note that you will see a performance drop going from the default to this value, as is always the case when going from unsafe to reliable caching behavior.

On other platforms, tuning wal_sync_method can be much more complicated. It's theoretically possible to improve write throughput on any UNIX-like system by switching from any write method that uses a write/fsync or write/fdatasync pair to using a true synchronous write. On platforms that support safe DSYNC write behavior, you may already see this as your default when checking it with SHOW:

wal_sync_method=open_datasync

Even though you won't see it explicitly listed in the configuration file as such. If this is the case on your platform, there's little optimization beyond that you can likely perform. open_datasync is generally the optimal approach, and when available it can even use direct I/O as well to bypass the operating system cache.

The Linux situation is perhaps the most complicated. As shown in the last code, this platform will default to fdatasync as the method used. It is possible to switch this to use synchronous writes with:

wal_sync_method=open_sync

Also, in many cases you can discover this is faster—sometimes much faster—than the default behavior. However, whether this is safe or not depends on your filesystem. The default filesystem on most Linux systems, ext3, does not handle O_SYNC writes safely in many cases, which can result in corruption. See "PANIC caused by open_sync on Linux" at http://archives.postgresql.org/pgsqlhackers/2007-10/msg01310.php for an example of how dangerous this setting can be on that platform. There is evidence that this particular area has fi nally been cleaned up on recent (2.6.32) kernels when using the ext4 filesystem instead, but this has not been tested extensively at the database level yet.

In any case, your own tests of wal_sync_method should include the "pull the cord" test, where you power the server off unexpectedly, to make sure you don't lose any data with the method you've used. Testing at a very high load for a long period of time is also advisable, to find intermittent bugs that might cause a crash.

wal_buffers Parameter WAL settings

While the documentation on wal_buffers suggests that the default of 64 KB is sufficient as long as no single transaction exceeds that value, in practice write-heavy benchmarks see optimal performance at higher values than you might expect from that, at least 1 MB or more. With the only downside being the increased use of shared memory, and as there's no case where more than a single WAL segment could need to be buffered, given modern server memory sizes the normal thing to do nowadays is to just set:

wal_buffers=16MB

Then forget about it as a potential bottleneck or item to tune further. Only if you're tight on memory should you consider a smaller setting.

Thursday, June 22, 2017

Parameter Server Configuration

effective_cache_size and per-client settings Parameter

June 22, 2017

PostgreSQL is expected to have both its own dedicated memory (shared_buffers) as well as utilize the filesystem cache. In some cases, when making decisions like whether it is efficient to use an index or not, the database compares sizes it computes against the effective sum of all these caches; that's what it expects to find in effective_cache_size.

The same rough rule of thumb that would put shared_buffers at 25 percent of system memory would set effective_cache_size to between 50 and 75 percent of RAM. To get a more accurate estimate, first observe the size of the filesystem cache:

UNIX-like systems: Add the free and cached numbers shown by the free or top commands to estimate the filesystem cache size
Windows: Use the Windows Task Manager's Performance tab and look at the System Cache size

Assuming you have already started the database, you need to then add the shared_buffers figure to this value to arrive at a figure for effective_cache_size. If the database hasn't been started yet, usually the OS cache will be an accurate enough estimate, when it's not running. Once it is started, most of the database's dedicated memory will usually be allocated to its buffer cache anyway.

effective_cache_size does not allocate any memory. It's strictly used as input on how queries are executed, and a rough estimate is sufficient for most purposes. However, if you set this value much too high, actually executing the resulting queries may result in both the database and OS cache being disrupted by reading in the large number of blocks required to satisfy the query believed to fit easily in RAM.

It's rare you'll ever see this parameter tuned on a per-client basis, even though it is possible.

Per-client settings Parameter

While all of the settings in this section can be adjusted per client, you'll still want good starting settings for these parameters in the main configuration file. Individual clients that need values outside the standard can always do so using the SET command within their session.

Featured

Saturday, July 8, 2017

log_min_duration_statement Parameter

Monday, July 3, 2017

Enabling autovacuum on older versions Parameter

Thursday, June 29, 2017

work_mem Parameter

maintainance_work_mem Parameter

Wednesday, June 28, 2017

Tuesday, June 27, 2017

checkpoint_segments Parameter Checkpoints

checkpoint_timeout Parameter Checkpoint

checkpoint_completion_target is value of Parameter Checkpoint.

Saturday, June 24, 2017

wal_buffers Parameter WAL settings

Thursday, June 22, 2017

Per-client settings Parameter