PostgreSQL Tuning: Server Configuration

Home Posts filed under Server Configuration

Showing posts with label Server Configuration. Show all posts

Saturday, July 8, 2017

Server Configuration

log_statement and og_min_duration_statement Parameter

July 08, 2017

The log_statement Parameter POSTGRESQL options for this setting are as follows:

none: Do not log any statement-level information.
ddl: Log only Data Definition Language (DDL) statements such as CREATE and DROP. This can normally be left on even in production, and is handy to catch major changes introduced accidentally or intentionally by administrators.
mod: Log any statement that modifies a value, which is essentially everything except for simple SELECT statements. If your workload is mostly SELECT based with relatively few data changes, this may be practical to leave enabled all the time.
all: Log every statement. This is generally impractical to leave on in production due to the overhead of the logging. However, if your server is powerful enough relative to its workload, it may be practical to keep it on all the time.

Statement logging is a powerful technique for finding performance issues. Analyzing the information saved by log_statement and related sources for statement-level detail can reveal the true source for many types of performance issues. You will need to combine this with appropriate analysis tools.

log_min_duration_statement Parameter

Once you have some idea of how long a typical query statement postgreSQL Tuning should take to execute, this setting allows you to log only the ones that exceed some threshold you set. The value is in milliseconds, so you might set:

log_min_duration_statement=1000

And then you'll only see statements that take longer than one second to run. This can be extremely handy for finding out the source of "outlier" statements that take much longer than most to execute.

If you are running 8.4 or later, you might instead prefer to use the auto_explainmodule: http://www.postgresql.org/docs/8.4/static/auto-explain.html instead of this feature. This will allow you to actually see why the queries that are running slowly are doing so by viewing their associated EXPLAIN plans.

Monday, July 3, 2017

Server Configuration

Autovacuum and Vacuuming and statistics Parameter

July 03, 2017

As both these tasks are critical to database performance over the long-term, starting in PostgreSQL 8.1 there is an autovacuum daemon available that will run in the background to handle these tasks for you. Its action is triggered by the number of changes to the database exceeding a threshold it calculates based on the existing table size.

The parameter for autovacuum is turned on by default in PostgreSQL 8.3, and the default settings are generally aggressive enough to work out of the box for smaller database with little manual tuning. Generally you just need to be careful that the amount of data in the free space map doesn't exceed max_fsm_pages, and even that requirement is automated away from being a concern as of 8.4.

Vacuuming and statistics Parameter

PostgreSQL databases require two primary forms of regular maintenance as data is added, updated, and deleted.

VACUUM cleans up after old transactions, including removing information that is no longer visible and returning freed space to where it can be re-used. The more often you UPDATE and DELETE information from the database, the more likely you'll need a regular vacuum cleaning regime. However, even static tables with data that never changes once inserted still need occasional care here.

ANALYZE looks at tables in the database and collects statistics about them— information like estimates of how many rows they have and how many distinct values are in there. Many aspects of query planning depend on this statistics data being accurate.

Enabling autovacuum on older versions Parameter

If you have autovacuum available but it's not turned on by default, which will be the case with PostgreSQL 8.1 and 8.2, there are a few related parameters that must also be enabled for it to work, as covered in http://www.postgresql.org/docs/8.1/interactive/maintenance.html or http://www.postgresql.org/docs/8.2/interactive/routine-vacuuming.html.

The normal trio to enable in the postgresql.conf file in these versions are:

stats_start_collector=true
stats_row_level=true
autovacuum=on

Note that as warned in the documentation, it's also wise to consider adjusting superuser_reserved_connections to allow for the autovacuum processes in these earlier versions.

The autovacuum you'll get in 8.1 and 8.2 is not going to be as efficient as what comes in 8.3 and later. You can expect it to take some fine tuning to get the right balance of enough maintenance without too much overhead, and because there's only a single worker it's easier for it to fall behind on a busy server. This topic isn't covered at length here. It's generally a better idea to put time into planning an upgrade to a PostgreSQL version with a newer autovacuum than to try and tweak an old one extensively, particularly if there are so many other performance issues that cannot be resolved easily in the older versions, too.

Thursday, June 29, 2017

Server Configuration

work_mem and maintainance_work_mem Parameter

June 29, 2017

work_mem Parameter

When a query is running that needs to sort data, the database estimates how much data is involved and then compares it to the work_mem parameter. If it's larger (and the default is only 1 MB), rather than sorting in memory it will write all the data out and use a disk-based sort instead. This is much, much slower than a memory based one. Accordingly, if you regularly sort data, and have memory to spare, a large increase in work_mem can be one of the most effective ways to speed up your server. A data warehousing report might on a giant server run with a gigabyte of work_mem for its larger reports.

The catch is that you can't necessarily predict the number of sorts any one client will be doing, and work_mem is a per-sort parameter rather than a per-client one. This means that memory use via work_mem is theoretically unbounded, where a number of clients sorting large enough things to happen concurrently.

In practice, there aren't that many sorts going on in a typical query, usually only one or two. And not every client that's active will be sorting at the same time. The normal guidance for work_mem is to consider how much free RAM is around after shared_buffers is allocated (the same OS caching size figure needed to compute effective_cache_size), divide by max_connections, and then take a fraction of that figure; a half of that would be an aggressive work_mem value. In that case, only if every client had two sorts active all at the same time would the server be likely to run out of memory, which is an unlikely scenario.

The work_mem computation is increasingly used in later PostgreSQL versions for estimating whether hash structures can be built in memory. Its use as a client, memory size threshold is not limited just to sorts. That's simply the easiest way to talk about the type of memory allocation decision it helps to guide.

Like synchronous_commit, work_mem can also be set per-client. This allows an approach where you keep the default to a moderate value, and only increase sort memory for the clients that you know are running large reports

maintainance_work_mem Parameter

A few operations in the database server need working memory for larger operations than just regular sorting. VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY all can allocate up to maintainance_work_mem worth of memory instead. As it's unlikely that many sessions will be doing one of these operations at once, it's possible to set this value much higher than the standard per-client work_mem setting. Note that at least autovacuum_max_workers (defaulting to 3 starting in version 8.3) will allocate this much memory, so consider those sessions (perhaps along with a session or two doing a CREATE INDEX) when setting this value.

Assuming you haven't increased the number of autovacuum workers, a typical high setting for this value on a modern server would be at five percent of the total RAM, so that even five such processes wouldn't exceed a quarter of available memory. This works out to approximately 50 MB of maintainance_work_mem per GB of server RAM.

Wednesday, June 28, 2017

Server Configuration

default_statistics_target Parameter

June 28, 2017

PostgreSQL makes its decisions about how queries execute based on statistics collected about each table in your database. This information is collected by analyzing the tables, either with the ANALYZE statement or via autovacuum doing that step. In either case, the amount of information collected during the analyze step is set by default_statistics_target. Increasing this value makes analysis take longer, and as analysis of autovacuum happens regularly this turns into increased background overhead for database maintenance. But if there aren't enough statistics about a table, you can get bad plans for queries against it.

The default value for this setting used to be the very low (that is,10), but was increased to 100 in PostgreSQL 8.4. Using that larger value was popular in earlier versions, too, for general improved query behavior. Indexes using the LIKE operator tended to work much better with values greater than 100 rather than below it, due to a hard-coded change at that threshold.

Note that increasing this value does result in a net slowdown on your system if you're not ever running queries where the additional statistics result in a change to a better query plan. This is one reason why some simple benchmarks show PostgreSQL 8.4 as slightly slower than 8.3 at default parameters for each, and in some cases you might return an 8.4 install to a smaller setting. Extremely large settings for default_statistics_target are discouraged due to the large overhead they incur.

If there is just a particular column in a table you know that needs better statistics, you can use ALTER TABLE SET STATISTICS on that column to adjust this setting just for it. This works better than increasing the system-wide default and making every table pay for that requirement. Typically, the columns that really require a lot more statistics to work properly will require a setting near the maximum of 1000 (increased to 10,000 in later versions) to get a serious behavior change, which is far higher than you'd want to collect data for on every table in the database

Tuesday, June 27, 2017

Server Configuration

CHECKPOINT Paramater

June 27, 2017

There are more checkpoint parameter for PostgreSQL tuning paramater, such as:

checkpoint_segments Parameter Checkpoints

Each WAL segment takes up 16 MB. As described at http://www.postgresql.org/docs/current/interactive/wal-configuration.html the maximum number of segments you can expect to be in use at any time is:

(2 + checkpoint_completion_target) * checkpoint_segments + 1

Note that in PostgreSQL versions before 8.3 that do not have spread checkpoints, you can still use this formula, just substitute the following code snippet for the value you'll be missing:

checkpoint_completion_target=0

The easiest way to think about the result is in terms of the total size of all the WAL segments that you can expect to see on disk, which has both a disk cost and serves as something that can be used to estimate the time for recovery after a database crash. The expected peak pg_xlog size grows as shown in the following table:

checkpoint_segments	checkpoint_completion_target=0	target=0.5	target=0.9
3	112MB	144MB	160MB
10	336MB	416MB	480MB
32	1040MB	1296MB	1504MB
64	2064MB	2576MB	2992MB
128	4112MB	5136MB	5968MB
256	8208MB	10256MB	11904MB

The general rule of thumb you can extract here is that for every 32 checkpoint segments, expect at least 1 GB of WAL files to accumulate. As database crash recovery can take quite a while to process even that much data, 32 is as high as you want to make this setting for anything but a serious database server. The default of 3 is very low for most systems though; even a small install should consider an increase to at least 10.

Normally, you'll only want a value greater than 32 on a smaller server when doing bulk-loading, where it can help performance significantly and crash recovery isn't important. Databases that routinely do bulk loads may need a higher setting.

checkpoint_timeout Parameter Checkpoint

checkpoint_timeout is value of Parameter Checkpoint. The default for this setting of 5 minutes is fine for most installations. If your system isn't able to keep up with writes and you've already increased checkpoint_segments to where the timeout is the main thing driving when checkpoints happen, it's reasonable to consider an increase to this value.

Aiming for 10 minutes or more between checkpoints isn't dangerous; again it just increases how long database recovery after a crash will take. As this is one component to database server downtime after a crash, that's something you need a healthy respect for.

checkpoint_completion_target is value of Parameter Checkpoint.

If you have increased checkpoint_segments to at least 10, it's reasonable at that point to also increase checkpoint_competion_target to its practical maximum of 0.9. This gives maximum checkpoint spreading, which theoretically means the smoothest I/O, too. In some cases keeping the default of 0.5 will still be better however, as it makes it less likely that one checkpoint's writes will spill into the next one.

It's unlikely that a value below 0.5 will be very effective at spreading checkpoints at all. Moreover, unless you have an extremely large value for the number of segments the practical difference between small changes in its value are unlikely to matter. One approach for the really thorough is to try both 0.5 and 0.9 with your application and see which one gives the smoother disk I/O curve over time, as judged by OS-level monitoring.

Saturday, June 24, 2017

Parameter Server Configuration

wal_sync_method and wal_buffers Parameter WAL settings

June 24, 2017

In this article We will discuss about wal_sync_method and wal_buffers for PostgreSQL Tuning Paramater. OK, let's take down...

One purpose of wal_sync_method is to tune such caching behavior.

The default behavior here is somewhat different from most of the options. When the server source code is compiled, a series of possible ways to write are considered. The one believed most efficient then becomes the compiled-in default. This value is not written to the postgresql.conf file at initdb time though, making it different from other auto-detected, platform-specific values such as shared_buffers.

Before adjusting anything, you should check what your platform detected as the fastest safe method using SHOW; the following is a Linux example:

postgres=# show wal_sync_method;
 wal_sync_method
-----------------
 fdatasync

On both Windows and the Mac OS X platforms, there is a special setting to make sure the OS clears any write-back caches. The safe value to use on these platforms that turns on this behavior is as follows:

wal_sync_method=fsync_writethrough

If you have this setting available to you, you really want to use it! It does exactly the right thing to make database writes safe, while not slowing down other applications the way disabling an entire hard drive write cache will do.

This setting will not work on all platforms however. Note that you will see a performance drop going from the default to this value, as is always the case when going from unsafe to reliable caching behavior.

On other platforms, tuning wal_sync_method can be much more complicated. It's theoretically possible to improve write throughput on any UNIX-like system by switching from any write method that uses a write/fsync or write/fdatasync pair to using a true synchronous write. On platforms that support safe DSYNC write behavior, you may already see this as your default when checking it with SHOW:

wal_sync_method=open_datasync

Even though you won't see it explicitly listed in the configuration file as such. If this is the case on your platform, there's little optimization beyond that you can likely perform. open_datasync is generally the optimal approach, and when available it can even use direct I/O as well to bypass the operating system cache.

The Linux situation is perhaps the most complicated. As shown in the last code, this platform will default to fdatasync as the method used. It is possible to switch this to use synchronous writes with:

wal_sync_method=open_sync

Also, in many cases you can discover this is faster—sometimes much faster—than the default behavior. However, whether this is safe or not depends on your filesystem. The default filesystem on most Linux systems, ext3, does not handle O_SYNC writes safely in many cases, which can result in corruption. See "PANIC caused by open_sync on Linux" at http://archives.postgresql.org/pgsqlhackers/2007-10/msg01310.php for an example of how dangerous this setting can be on that platform. There is evidence that this particular area has fi nally been cleaned up on recent (2.6.32) kernels when using the ext4 filesystem instead, but this has not been tested extensively at the database level yet.

In any case, your own tests of wal_sync_method should include the "pull the cord" test, where you power the server off unexpectedly, to make sure you don't lose any data with the method you've used. Testing at a very high load for a long period of time is also advisable, to find intermittent bugs that might cause a crash.

wal_buffers Parameter WAL settings

While the documentation on wal_buffers suggests that the default of 64 KB is sufficient as long as no single transaction exceeds that value, in practice write-heavy benchmarks see optimal performance at higher values than you might expect from that, at least 1 MB or more. With the only downside being the increased use of shared memory, and as there's no case where more than a single WAL segment could need to be buffered, given modern server memory sizes the normal thing to do nowadays is to just set:

wal_buffers=16MB

Then forget about it as a potential bottleneck or item to tune further. Only if you're tight on memory should you consider a smaller setting.

Thursday, June 22, 2017

Parameter Server Configuration

effective_cache_size and per-client settings Parameter

June 22, 2017

PostgreSQL is expected to have both its own dedicated memory (shared_buffers) as well as utilize the filesystem cache. In some cases, when making decisions like whether it is efficient to use an index or not, the database compares sizes it computes against the effective sum of all these caches; that's what it expects to find in effective_cache_size.

The same rough rule of thumb that would put shared_buffers at 25 percent of system memory would set effective_cache_size to between 50 and 75 percent of RAM. To get a more accurate estimate, first observe the size of the filesystem cache:

UNIX-like systems: Add the free and cached numbers shown by the free or top commands to estimate the filesystem cache size
Windows: Use the Windows Task Manager's Performance tab and look at the System Cache size

Assuming you have already started the database, you need to then add the shared_buffers figure to this value to arrive at a figure for effective_cache_size. If the database hasn't been started yet, usually the OS cache will be an accurate enough estimate, when it's not running. Once it is started, most of the database's dedicated memory will usually be allocated to its buffer cache anyway.

effective_cache_size does not allocate any memory. It's strictly used as input on how queries are executed, and a rough estimate is sufficient for most purposes. However, if you set this value much too high, actually executing the resulting queries may result in both the database and OS cache being disrupted by reading in the large number of blocks required to satisfy the query believed to fit easily in RAM.

It's rare you'll ever see this parameter tuned on a per-client basis, even though it is possible.

Per-client settings Parameter

While all of the settings in this section can be adjusted per client, you'll still want good starting settings for these parameters in the main configuration file. Individual clients that need values outside the standard can always do so using the SET command within their session.

Wednesday, June 21, 2017

Parameter Server Configuration

synchronous_commit Parameter

June 21, 2017

The overhead of waiting for physical disk commits was stressed as a likely bottleneck for committing transactions. If you don't have a battery-backed write cache to accelerate that, but you need better commit speed, what can you do? The standard approach is to disable synchronous_commit, which is sometimes alternately referred to as enabling asynchronous commits. This groups commits into chunks at a frequency determined by the related wal_writer_delay parameter. The default settings guarantee a real commit to disk at most 600 milliseconds after the client commit. During that window, which you can reduce in size with a corresponding decrease in speed-up, that data will not be recovered afterwards if your server crashes.

Note that it's possible to turn this parameter off for a single client during its session rather than making it a server-wide choice:

SET LOCAL synchronous_commit TO OFF;

This provides you with the option of having different physical commit guarantees for different types of data you put into the database. A routine activity monitoring table, one that was frequently inserted into and where a fraction of a second of loss is acceptable, would be a good candidate for asynchronous commit. An infrequently written table holding real-world monetary transactions should prefer the standard synchronous commit.

Monday, June 19, 2017

Parameter Server Configuration

random_page_cost and constraint_exclusion Parameter

June 19, 2017

random_page_cost Parameter

This parameter is common to tune, but explaining what it does requires a lot of background about how queries are planned. Particularly in earlier PostgreSQL versions, lowering this value from its default—for example, a reduction from 4.0 to 2.0—was a common technique. It was used for making it more likely that the planner would use indexed queries instead of the alternative of a sequential scan. With the smarter planner in current versions, this is certainly not where you want to start tuning at. You should prefer getting better statistics and setting the memory parameters as primary ways to influence the query planner.

constraint_exclusion Parameter

If you are using PostgreSQL 8.3 or earlier versions, and you are using the database's table inheritance feature to partition your data, you'll need to turn this parameter on.

Starting in 8.4, constraint_exclusion defaults to a new smarter setting named partition that will do the right thing in most situations without it ever needing to be adjusted.

Saturday, June 17, 2017

Parameter Server Configuration

Configuration parameter Tunables that to avoid

June 17, 2017

There are a few parameters in the postgesql.conf that have gathered up poor guidance in other guides you might come across, and they might already be set badly in a server whose configuration you're now responsible for. Others have names suggesting a use for the parameter that actually doesn't exist. This section warns you about the most common of those to avoid adjusting.

Some Parameter Tunable to avoid are:

fsync

full_page_writes

commit_delay and commit_siblings

max_prepared_transactions

Query enable parameters

Friday, June 16, 2017

Parameter Server Configuration

fsync and max_prepared_transactions Parameter

June 16, 2017

If you just want to ignore crash recovery altogether, you can do that by turning off the fsync parameter. This makes the value for wal_sync_method irrelevant, because the server won't be doing any WAL sync calls anymore.

It is important to recognize that if you have any sort of server crash when fsync is disabled, it is likely your database will be corrupted and no longer start afterwards. Despite this being a terrible situation to be running a database under, the performance speedup of turning crash recovery off is so large that you might come across suggestions you disable fsync anyway. You should be equally hesitant to trust any other advice you receive from sources suggesting this, as it is an unambiguously dangerous setting to disable.

One reason this idea gained traction is that in earlier PostgreSQL versions, there was no way to reduce the number of fsync calls to a lower number—to trade-off some amount of reliability for performance. Starting 8.3, in most cases where people used to disable fsync it's a better idea to turn off synchronous_commit instead.

There is one case where fsync=off may still make sense: initial bulk loading. If you're inserting a very large amount of data into the database, and do not have hardware with a battery-backed write cache, you might discover this takes far too long to ever be practical. In this case, turning the parameter off during the load— where all data can easily be recreated if there is a crash causing corruption—may be the only way to get loading time below your target. Once your server is back up again, you should turn it right back on again.

Some systems will also turn off fsync on servers with redundant copies of the database— for example, slaves used for reporting purposes. These can always resynchronize against the master if their data gets corrupted.

max_prepared_transactions Parameter

Many people see this name and assume that as they use prepared statements, a common technique to avoid SQL injection, that they need to increase this value. This is not the case; the two are not related. A prepared transaction is one that uses PREPARE TRANSACTION for two-phase commit (2PC).

If you're not specifically using that command and 2PC, you can leave this value at its default. If you are using those features, only then will you likely need to increase it to match the number of connections

Wednesday, June 14, 2017

Parameter Server Configuration

commit_delay and commit_siblings Parameter

June 14, 2017

Before synchronous_commit was implemented, there was an earlier attempt to add that sort of feature enabled by the commit_delay and commit_siblings parameters. These are not effective parameters to tune in most cases. It is extremely difficult to show any speedup by adjusting them, and quite easy to slow every transaction down by tweaking them.

The only case where they have shown some value is for extremely high I/O rate systems. Increasing the delay to a very small amount can make writes happen in bigger blocks, which sometimes turn out better aligned when combined with larger RAID stripe sizes in particular.

Tuesday, June 13, 2017

Parameter Server Configuration Tuning Query Tuning Script

Query enable parameters and Logging Parameter

June 13, 2017

It's possible to disable many of the query planner's techniques, in hopes of avoiding a known bad type of query. This is sometimes used as a work-around for the fact that PostgreSQL doesn't support direct optimizer hints for how to execute a query.

You might see the following code snippet, suggested as a way to force use of indexes instead of sequential scans for example:

enable_seqscan = off

Generally this is a bad idea, and you should improve the information the query optimizer is working with so it makes the right decisions instead.

Logging Parameter

General logging setup is important but it is somewhat outside the scope of this article. You may need to set parameters such as log_destination, log_directory, and log_filename to save your log files in a way compatible with the system administrations requirements of your environment. These will all be set to reasonable defaults to get started with on most systems.

On UNIX-like systems, it's common for some of the database logging to be set in the script that starts and stops the server, rather than directly in the postgresql.conf file. If you instead use the pg_ctlcommand to manually start the server, you may discover that logging ends up on your screen instead. You'll need to look at the script that starts the server normally (commonly /etc/init.d/postgresql) to determine what it does, if you want to duplicate that behavior. In most cases, you just need to add –l logfilename to the pg_ctl command line to redirect its output to the standard location.

Tuesday, June 6, 2017

Parameter Server Server Configuration

log_line_prefix, full_page_writes, Logging Parameter

June 06, 2017

Logging

General logging setup is important but it is somewhat outside the scope of this article. You may need to set parameters such as log_destination, log_directory, and log_filename to save your log files in a way compatible with the system administrations requirements of your environment. These will all be set to reasonable defaults to get started with on most systems.
On UNIX-like systems, it's common for some of the database logging to be set in the script that starts and stops the server, rather than directly in the postgresql.conf file. If you instead use the pg_ctl command to manually start the server, you may discover that logging ends up on your screen instead. You'll need to look at the script that starts the server normally (commonly /etc/init.d/postgresql) to determine what it does, if you want to duplicate that behavior. In most cases, you just need to add –l logfilename to the pg_ctl command line to redirect its output to the standard location.

log_line_prefix

The default log_line_prefix is empty, which is not what you want. A good starting value here is the following:

log_line_prefix='%t:%r:%u@%d:[%p]: '

This will put the following into every log line:

%t: Timestamp
%u: Database user name
%r: Remote host connection is from
%d: Database connection is to
%p: Process ID of connection

It may not be obvious what you'd want all of these values for initially, particularly, the process ID. Once you've tried to chase down a few performance issues, the need for saving these values will be more obvious, and you'll be glad to already have this data logged.
Another approach worth considering is setting log_line_prefix such that the resulting logs will be compatible with the pgFouine program. That is a reasonable, general purpose logging prefix, and many sites end up needing to do some sort of query analysis eventually.

full_page_writes Parameter

Much like fsync, turning this parameter off increases the odds of database corruption in return for an increase in performance.

You should only consider adjusting this parameter if you're doing extensive researching into your filesystem and hardware, in order to assure partial page writes do not happen.

Monday, June 5, 2017

Server Configuration

Basic Configuration Parameter for Tuning Your PostgreSQL Server

June 05, 2017

If We install Database PostgreSQL, there are PostgreSQL ships with a basic configuration tuned for wide compatibility rather than performance on Server. OK, lets go to review basic configuration.

listen_addresses

By default, PostgreSQL only responds to connections from the local host. If you want your server to be accessible from other systems via standard TCP/IP networking, you need to change listen_addresses from its default. The usual approach is to set it to listen to all addresses like this:

listen_addresses = '*'

And then control who can and cannot connect via the pg_hba.conf file.

max_connections

max_connections sets exactly that: the maximum number of client connections allowed. This is very important to some of the below parameters (particularly work_mem) because there are some memory resources that are or can be allocated on a per-client basis, so the maximum number of clients suggests the maximum possible memory use. Generally, PostgreSQL on good hardware can support a few hundred connections. If you want to have thousands instead, you should consider using connection pooling software to reduce the connection overhead.

shared_buffers

The shared_buffers configuration parameter determines how much memory is dedicated to PostgreSQL to use for caching data. One reason the defaults are low is because on some platforms (like older Solaris versions and SGI), having large values requires invasive action like recompiling the kernel. Even on a modern Linux system, the stock kernel will likely not allow setting shared_buffers to over 32MB without adjusting kernel settings first. (PostgreSQL 9.4 and later use a different shared memory mechanism, so kernel settings will usually not have to be adjusted there.)

If you have a system with 1GB or more of RAM, a reasonable starting value for shared_buffers is 1/4 of the memory in your system. If you have less RAM you'll have to account more carefully for how much RAM the OS is taking up; closer to 15% is more typical there. There are some workloads where even larger settings for shared_buffers are effective, but given the way PostgreSQL also relies on the operating system cache, it's unlikely you'll find using more than 40% of RAM to work better than a smaller amount.

Be aware that if your system or PostgreSQL build is 32-bit, it might not be practical to set shared_buffers above 2 ~ 2.5GB. See this blog post for details.

Note that on Windows, large values for shared_buffers aren't as effective, and you may find better results keeping it relatively low and using the OS cache more instead. On Windows the useful range is 64MB to 512MB.

If you are running PostgreSQL 9.2 or earlier, it's likely you will have to increase the amount of memory your operating system allows you to allocate at once to set the value for shared_buffers this high. On UNIX-like systems, if you set it above what's supported, you'll get a message like this:

IpcMemoryCreate: shmget(key=5432001, size=415776768, 03600) failed: Invalid argument
This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter. You can either reduce the request size or reconfigure the kernel with larger SHMMAX. To reduce the request size (currently 415776768 bytes), reduce PostgreSQL's shared_buffers parameter (currently 50000) and/or its max_connections parameter (currently 12).

Changing this setting requires restarting the database. Also, this is a hard allocation of memory; the whole thing gets allocated out of virtual memory when the database starts.

effective_cache_size

effective_cache_size should be set to an estimate of how much memory is available for disk caching by the operating system and within the database itself, after taking into account what's used by the OS itself and other applications. This is a guideline for how much memory you expect to be available in the OS and PostgreSQL buffer caches, not an allocation! This value is used only by the PostgreSQL query planner to figure out whether plans it's considering would be expected to fit in RAM or not. If it's set too low, indexes may not be used for executing queries the way you'd expect. The setting for shared_buffers is not taken into account here--only the effective_cache_size value is, so it should include memory dedicated to the database too.

Setting effective_cache_size to 1/2 of total memory would be a normal conservative setting, and 3/4 of memory is a more aggressive but still reasonable amount. You might find a better estimate by looking at your operating system's statistics. On UNIX-like systems, add the free+cached numbers from free or top to get an estimate. On Windows see the "System Cache" size in the Windows Task Manager's Performance tab. Changing this setting does not require restarting the database (HUP is enough).

checkpoint_segments checkpoint_completion_target

PostgreSQL writes new transactions to the database in files called WAL segments that are 16MB in size. Every time checkpoint_segments worth of these files have been written, by default 3, a checkpoint occurs. Checkpoints can be resource intensive, and on a modern system doing one every 48MB will be a serious performance bottleneck. Setting checkpoint_segments to a much larger value improves that. Unless you're running on a very small configuration, you'll almost certainly be better setting this to at least 10, which also allows usefully increasing the completion target.

For more write-heavy systems, values from 32 (checkpoint every 512MB) to 256 (every 4GB) are popular nowadays. Very large settings use a lot more disk and will cause your database to take longer to recover, so make sure you're comfortable with both those things before large increases. Normally the large settings (>64/1GB) are only used for bulk loading. Note that whatever you choose for the segments, you'll still get a checkpoint at least every 5 minutes unless you also increase checkpoint_timeout (which isn't necessary on most systems).

Checkpoint writes are spread out a bit while the system starts working toward the next checkpoint. You can spread those writes out further, lowering the average write overhead, by increasing the checkpoint_completion_target parameter to its useful maximum of 0.9 (aim to finish by the time 90% of the next checkpoint is here) rather than the default of 0.5 (aim to finish when the next one is 50% done). A setting of 0 gives something similar to the behavior of obsolete versions. The main reason the default isn't just 0.9 is that you need a larger checkpoint_segments value than the default for broader spreading to work well. For lots more information on checkpoint tuning, see Checkpoints and the Background Writer (where you'll also learn why tuning the background writer parameters is challenging to do usefully).

autovacuum

The autovacuum process takes care of several maintenance chores inside your database that you really need. Generally, if you think you need to turn regular vacuuming off because it's taking too much time or resources, that means you're doing it wrong. The answer to almost all vacuuming problems is to vacuum more often, not less, so that each individual vacuum operation has less to clean up.

However, it's acceptable to disable autovacuum for short periods of time, for instance when bulk loading large amounts of data.

logging

There are many things you can log that may or may not be important to you. You should investigate the documentation on all of the options, but here are some tips & tricks to get you started:

- pgFouine is a tool used to analyze postgresql logs for performance tuning. If you plan to use this tool, it has specific logging requirements. Please check http://pgfouine.projects.postgresql.org/

- pgFouine has been obsoleted by PgBadger

- PgCluu is an handy tool from the author of PgBadger, and is a PostgreSQL performances monitoring and auditing tool.

- log_destination & log_directory (& log_filename): What you set these options to is not as important as knowing they can give you hints to determine where your database server is logging to. Best practice would be to try and make this as similar as possible across your servers. Note that in some cases, the init script starting your database may be customizing the log destination in the command line used to start the database, overriding what's in the configuration files (and making it so you'll get different behavior if you run pg_ctl manually instead of using the init script).

- log_min_error_statement: You should probably make sure this is at least on error, so that you will see any SQL commands which cause an error. should be the default on recent versions.

- log_min_duration_statement: Not necessary for everyday use, but this can generate logs of "slow queries" on your system.

- log_line_prefix: Appends information to the start of each line. A good generic recommendation is '%t:%r:%u@%d:[%p]: ' : %t=timestamp, %u=db user name, %r=host connecting from, %d=database connecting to, %p=PID of connection. It may not be obvious what the PID is useful at first, but it can be vital for trying to troubleshoot problems in the future so better to put in the logs from the start.

-log_statement: Choices of none, ddl, mod, all. Using all in production leads to severe performance penalties. DDL can sometime be helpful to discover rogue changes made outside of your recommend processes, by "cowboy DBAs" for example.

default_statistics_target

The database software collects statistics about each of the tables in your database to decide how to execute queries against it. If you're not getting good execution query plans particularly on larger (or more varied) tables you should increase default_statistics_target then ANALYZE the database again (or wait for autovacuum to do it for you).

PostgreSQL 8.4 and later

The starting default_statistics_target value was raised from 10 to 100 in PostgreSQL 8.4. Increases beyond 100 may still be useful, but this increase makes for greatly improved statistics estimation in the default configuration. The maximum value for the parameter was also increased from 1000 to 10,000 in 8.4.

work_mem

If you do a lot of complex sorts, and have a lot of memory, then increasing the work_mem parameter allows PostgreSQL to do larger in-memory sorts which, unsurprisingly, will be faster than disk-based equivalents.

This size is applied to each and every sort done by each user, and complex queries can use multiple working memory sort buffers. Set it to 50MB, and have 30 users submitting queries, and you are soon using 1.5GB of real memory. Furthermore, if a query involves doing merge sorts of 8 tables, that requires 8 times work_mem. You need to consider what you set max_connections to in order to size this parameter correctly. This is a setting where data warehouse systems, where users are submitting very large queries, can readily make use of many gigabytes of memory.

log_temp_files can be used to log sorts, hashes, and temp files which can be useful in figuring out if sorts are spilling to disk instead of fitting in memory. You can see sorts spilling to disk using EXPLAIN ANALYZE plans as well. For example, if you see a line like Sort Method: external merge Disk: 7526kB in the output of EXPLAIN ANALYZE, a work_mem of at least 8MB would keep the intermediate data in memory and likely improve the query response time.

maintenance_work_mem

Specifies the maximum amount of memory to be used by maintenance operations, such as VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY. It defaults to 16 megabytes (16MB). Since only one of these operations can be executed at a time by a database session, and an installation normally doesn't have many of them running concurrently, it's safe to set this value significantly larger than work_mem. Larger settings might improve performance for vacuuming and for restoring database dumps.

wal_sync_method wal_buffers

After every transaction, PostgreSQL forces a commit to disk out to its write-ahead log. This can be done a couple of ways, and on some platforms the other options are considerably faster than the conservative default. open_sync is the most common non-default setting switched to, on platforms that support it but default to one of the fsync methods. See Tuning PostgreSQL WAL Synchronization for a lot of background on this topic. Note that open_sync writing is buggy on some platforms (such as Linux), and you should (as always) do plenty of tests under a heavy write load to make sure that you haven't made your system less stable with this change. Reliable Writes contains more information on this topic.

Linux kernels starting with version 2.6.33 will cause earlier versions of PostgreSQL to default to wal_sync_method=open_datasync; before that kernel release the default picked was always fdatasync. This can cause a significant performance decrease when combined with small writes and/or small values for wal_buffers.

Increasing wal_buffers from its tiny default of a small number of kilobytes is helpful for write-heavy systems. Benchmarking generally suggests that just increasing to 1MB is enough for some large systems, and given the amount of RAM in modern servers allocating a full WAL segment (16MB, the useful upper-limit here) is reasonable. Changing wal_buffers requires a database restart.

PostgreSQL 9.1 and later

Starting with PostgreSQL 9.1 wal_buffers defaults to being 1/32 of the size of shared_buffers, with an upper limit of 16MB (reached when shared_buffers=512MB).

PostgreSQL 9.1 also changes the logic for selecting the default wal_sync_method such that on newer Linux kernels, it will still select fdatasync as its method--the same as on older Linux versions.

constraint_exclusion

constraint_exclusion now defaults to a new choice: partition. This will only enable constraint exclusion for partitioned tables which is the right thing to do in nearly all cases.

max_prepared_transactions

This setting is used for managing 2 phase commit. If you do not use two phase commit (and if you don't know what it is, you don't use it), then you can set this value to 0. That will save a little bit of shared memory. For database systems with a large number (at least hundreds) of concurrent connections, be aware that this setting also affects the number of available lock-slots in pg_locks, so you may want to leave it at the default setting. There is a formula for how much memory gets allocated in the docs and in the default postgresql.conf.

Changing max_prepared_transactions requires a server restart.

synchronous_commit

PostgreSQL can only safely use a write cache if it has a battery backup. See WAL reliability for an essential introduction to this topic. No, really; go read that right now, it's vital to understand that if you want your database to work right.

You may be limited to approximately 100 transaction commits per second per client in situations where you don't have such a durable write cache (and perhaps only 500/second even with lots of clients).

For situations where a small amount of data loss is acceptable in return for a large boost in how many updates you can do to the database per second, consider switching synchronous commit off. This is particularly useful in the situation where you do not have a battery-backed write cache on your disk controller, because you could potentially get thousands of commits per second instead of just a few hundred.

For obsolete versions of PostgreSQL, you may find people recommending that you set fsync=off to speed up writes on busy systems. This is dangerous--a power loss could result in your database getting corrupted and not able to start again. Synchronous commit doesn't introduce the risk of corruption, which is really bad, just some risk of data loss.

random_page_cost

This setting suggests to the optimizer how long it will take your disks to seek to a random disk page, as a multiple of how long a sequential read (with a cost of 1.0) takes. If you have particularly fast disks, as commonly found with RAID arrays of SCSI disks, it may be appropriate to lower random_page_cost, which will encourage the query optimizer to use random access index scans. Some feel that 4.0 is always too large on current hardware; it's not unusual for administrators to standardize on always setting this between 2.0 and 3.0 instead. In some cases that behavior is a holdover from earlier PostgreSQL versions where having random_page_cost too high was more likely to screw up plan optimization than it is now (and setting at or below 2.0 was regularly necessary). Since these cost estimates are just that--estimates--it shouldn't hurt to try lower values.

But this not where you should start to search for plan problems. Note that random_page_cost is pretty far down this list (at the end in fact). If you are getting bad plans, this shouldn't be the first thing you look at, even though lowering this value may be effective. Instead, you should start by making sure autovacuum is working properly, that you are collecting enough statistics, and that you have correctly sized the memory parameters for your server--all the things gone over above. After you've done all those much more important things, if you're still getting bad plans then you should see if lowering random_page_cost is still useful.

Server Configuration Tuning Hardware

How to Tuning Your PostgreSQL Server

June 05, 2017

In this articles, we will explain about How to Tuning Your PostgreSQL Server. Actually, when we Tuning a server, we will customize list of parameters and configurations that own by a Server.

PostgreSQL ships with a basic configuration tuned for wide compatibility rather than performance. Odds are good the default parameters are very undersized for your system. Rather than get dragged into the details of everything you should eventually know (which you can find if you want it at the GUC Three Hour Tour), here we're going to sprint through a simplified view of the basics, with a look at the most common things people new to PostgreSQL aren't aware of. You should click on the name of the parameter in each section to jump to the relevant documentation in the PostgreSQL manual for more details after reading the quick intro here. There is also additional information available about many of these parameters, as well as a list of parameters you shouldn't adjust, at Server Configuration Tuning.

PostgreSQL settings can be manipulated a number of different ways, but generally you will want them changed in your configuration files, either directly or, starting with PostgreSQL 9.4, through ALTER SYSTEM. The specific options available change from release to release, the definitive list is in the source code at src/backend/utils/misc/guc.c for your version of PostgreSQL (but the pg_settings view works well enough for most purposes).

The types of settings

There are several different types of configuration settings, divided up based on the possible inputs they take

Boolean: true, false, on, off
Integer: Whole numbers (2112)
Float: Decimal values (21.12)
Memory / Disk: Integers (2112) or "computer units" (512MB, 2112GB). Avoid integers--you need to know the underlying unit to figure out what they mean.
Time: "Time units" aka d,m,s (30s). Sometimes the unit is left out; don't do that
Strings: Single quoted text ('pg_log')
ENUMs: Strings, but from a specific list ('WARNING', 'ERROR')
Lists: A comma separated list of strings ('"$user",public,tsearch2)

And, When they can take effect ??

When You Do Tuning Your PostgreSQL Server with above configuration settings, will have different levels of flexibility for when they can be changed, usually related to internal code restrictions. The complete list of levels is:

Postmaster: requires restart of server
Sighup: requires a HUP of the server, either by kill -HUP (usually -1), pg_ctl reload, or SELECT pg_reload_conf();
User: can be set within individual sessions, take effect only within that session
Internal: set at compile time, can't be changed, mainly for reference
Backend: settings which must be set before session start
Superuser: can be set at runtime for the server by superusers

Most of the time you'll only use the first of these, but the second can be useful if you have a server you don't want to take down, while the user session settings can be helpful for some special situations. You can tell which type of parameter a setting is by looking at the "context" field in the pg_settings view.

Important notes about configuration files

Command line options override postgresql.auto.conf settings override postgresql.conf settings.
If the same setting is listed multiple times, the last one wins.
You can figure out the postgresql.conf location with SHOW config_file. It will generally be $PGDATA/postgresql.conf (SHOW data_directory), but watch out for symbolic links, postmaster.opts and other trickiness
Lines with # are comments and have no effect. For a new database, this will mean the setting is using the default, but on running systems this may not hold true! Changes to the configuration files do not take effect without a reload/restart, so it's possible for the system to be running something different from what is in the file.

Viewing the current settings

Look at the configuration files. This is generally not definitive!
SHOW ALL, SHOW <setting> will show you the current value of the setting. Watch out for session specific changes
SELECT * FROM pg_settings will label session specific changes as locally modified

Featured

Saturday, July 8, 2017

log_min_duration_statement Parameter

Monday, July 3, 2017

Enabling autovacuum on older versions Parameter

Thursday, June 29, 2017

work_mem Parameter

maintainance_work_mem Parameter

Wednesday, June 28, 2017

Tuesday, June 27, 2017

checkpoint_segments Parameter Checkpoints

checkpoint_timeout Parameter Checkpoint

checkpoint_completion_target is value of Parameter Checkpoint.

Saturday, June 24, 2017

wal_buffers Parameter WAL settings

Thursday, June 22, 2017

Per-client settings Parameter

Wednesday, June 21, 2017

Monday, June 19, 2017

random_page_cost Parameter

constraint_exclusion Parameter

Saturday, June 17, 2017

Friday, June 16, 2017

Wednesday, June 14, 2017

Tuesday, June 13, 2017

Tuesday, June 6, 2017

Logging

log_line_prefix

Monday, June 5, 2017