PostgreSQL Tuning

Home All posts

Thursday, April 19, 2018

Tuning Script

How to get LastDay of this month

April 19, 2018

In this tips Postgresql sintax, we will give example How to get LastDay of this month.

NOW() function is current date with time.

select now()
"2018-04-20 13:51:09.20666+07"

Often when working with dates, we either need to get the first day of the month or the last day of the month.

Getting the first day is easy and can be done with date_trunc.

SELECT date_trunc('MONTH', now())::DATE;
"2018-04-01"

And then, to get LastDay of this month using this sintax :

SELECT (date_trunc('MONTH', now()) + INTERVAL '1 MONTH - 1 day')::DATE;
"2018-04-30"

Lets see, easy right..?

Tuning Script

How to INSERT using ON CONFLICT clause on Postgresql Database - UPSERT sintax

April 19, 2018

Hi, sometime we need perform multiple insert into table of Posgresql Database. This article show you, How to INSERT using ON CONFLICT clause on Postgresql Database. OK, let's go to the point.

PostgreSQL since version 9.5 has UPSERT syntax, with ON CONFLICT clause. with the following syntax (similar to MySQL).

INSERT INTO the_table (id, column_1, column_2) 
VALUES (1, 'A', 'X'), (2, 'B', 'Y'), (3, 'C', 'Z')
ON CONFLICT (id) DO UPDATE 
  SET column_1 = excluded.column_1, 
      column_2 = excluded.column_2;

ON CONFLICT Clause

The optional ON CONFLICT clause specifies an alternative action to raising a unique violation or exclusion constraint violation error. For each individual row proposed for insertion, either the insertion proceeds, or, if an arbiter constraint or index specified by conflict_target is violated, the alternative conflict_action is taken. ON CONFLICT DO NOTHING simply avoids inserting a row as its alternative action. ON CONFLICT DO UPDATE updates the existing row that conflicts with the row proposed for insertion as its alternative action.

conflict_target can perform unique index inference. When performing inference, it consists of one or more index_column_name columns and/or index_expression expressions, and an optional index_predicate. All table_name unique indexes that, without regard to order, contain exactly the conflict_target-specified columns/expressions are inferred (chosen) as arbiter indexes. If an index_predicate is specified, it must, as a further requirement for inference, satisfy arbiter indexes. Note that this means a non-partial unique index (a unique index without a predicate) will be inferred (and thus used by ON CONFLICT) if such an index satisfying every other criteria is available. If an attempt at inference is unsuccessful, an error is raised.

ON CONFLICT DO UPDATE guarantees an atomic INSERT or UPDATE outcome; provided there is no independent error, one of those two outcomes is guaranteed, even under high concurrency. This is also known as UPSERT — “UPDATE or INSERT”.

conflict_target
Specifies which conflicts ON CONFLICT takes the alternative action on by choosing arbiter indexes. Either performs unique index inference, or names a constraint explicitly. For ON CONFLICT DO NOTHING, it is optional to specify a conflict_target; when omitted, conflicts with all usable constraints (and unique indexes) are handled. For ON CONFLICT DO UPDATE, a conflict_target must be provided.

conflict_action
conflict_action specifies an alternative ON CONFLICT action. It can be either DO NOTHING, or a DO UPDATE clause specifying the exact details of the UPDATE action to be performed in case of a conflict. The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the existing row using the table's name (or an alias), and to rows proposed for insertion using the special excluded table. SELECT privilege is required on any column in the target table where corresponding excluded columns are read.

Note that the effects of all per-row BEFORE INSERT triggers are reflected in excluded values, since those effects may have contributed to the row being excluded from insertion.

index_column_name
The name of a table_name column. Used to infer arbiter indexes. Follows CREATE INDEX format. SELECT privilege on index_column_name is required.

index_expression
Similar to index_column_name, but used to infer expressions on table_name columns appearing within index definitions (not simple columns). Follows CREATE INDEX format. SELECT privilege on any column appearing within index_expression is required.
collation
When specified, mandates that corresponding index_column_name or index_expression use a particular collation in order to be matched during inference. Typically this is omitted, as collations usually do not affect whether or not a constraint violation occurs. Follows CREATE INDEX format.

opclass
When specified, mandates that corresponding index_column_name or index_expression use particular operator class in order to be matched during inference. Typically this is omitted, as the equality semantics are often equivalent across a type's operator classes anyway, or because it's sufficient to trust that the defined unique indexes have the pertinent definition of equality. Follows CREATE INDEX format.

index_predicate
Used to allow inference of partial unique indexes. Any indexes that satisfy the predicate (which need not actually be partial indexes) can be inferred. Follows CREATE INDEX format. SELECT privilege on any column appearing within index_predicate is required.

constraint_name
Explicitly specifies an arbiter constraint by name, rather than inferring a constraint or index.

condition
An expression that returns a value of type boolean. Only rows for which this expression returns true will be updated, although all rows will be locked when the ON CONFLICT DO UPDATE action is taken. Note that condition is evaluated last, after a conflict has been identified as a candidate to update.

Note that exclusion constraints are not supported as arbiters with ON CONFLICT DO UPDATE. In all cases, only NOT DEFERRABLE constraints and unique indexes are supported as arbiters.

INSERT with an ON CONFLICT DO UPDATE clause is a “deterministic” statement. This means that the command will not be allowed to affect any single existing row more than once; a cardinality violation error will be raised when this situation arises. Rows proposed for insertion should not duplicate each other in terms of attributes constrained by an arbiter index or constraint.

Thursday, September 14, 2017

Tuning Query Tuning Script

How to Guidance of Query Tuning PostgreSQL

September 14, 2017

In this article will provide How to Guidance of Query Tuning PostgreSQL. Now that you just grasp that statements square measure acting poorly and ready see their execution plans, it's time to begin tweaking the question to urge higher performance. this can be wherever you create changes to the queries and/or add indexes to undertake and find a far better execution arrange. begin with the bottlenecks and see if there square measure changes you'll create that scale back prices and/or execution times.

A note concerning knowledge cache and examination apples to apples

As you create changes and valuate Query Tuning PostgreSQL the resuling execution plans to examine if it's higher, it's vital to grasp that future executions can be relying upon knowledge caching that yield the perception of higher results. If you run a question once, create a tweak and run it a second time, it's seemingly it'll run a lot of quicker although the execution arrange isn't additional favorable. this can be as a result of PostgreSQL might need cached knowledge utilized in the primary run and is ready to use it within the second run. Therefore, you ought to run queries a minimum of three times and average the results to check apples to apples.

OK, Here are some how to Guidance of Query Tuning PostgreSQL that will facilitate recover execution plans:

1. Indexes

Eliminate serial Scans (Seq Scan) by adding indexes (unless table size is small)
If employing a multicolumn index, ensure you listen to order within which you outline the enclosed columns - additional data
Try to use indexes that square measure extremely selective on commonly-used knowledge. this can create their use additional economical.

2. WHERE clause

Avoid LIKE
Avoid operate calls in wherever clause
Avoid giant IN() statements

3. JOINs

When connection tables, attempt to use an easy equality statement within the ON clause (i.e. a.id = b.person_id). Doing thus permits additional economical be part of techniques to be used (i.e. Hash be part of instead of Nested Loop Join)
Convert subqueries to affix statements once potential as this typically permits the optimizer to know the intent and probably selected a far better arrange
Use be part ofs properly: square measure you exploitation cluster BY or DISTINCT simply because you're obtaining duplicate results? This typically indicates improper JOIN usage and will end in a better prices
If the execution arrange is employing a Hash be part of it is terribly slow if table size estimates square measure wrong. Therefore, ensure your table statistics square measure correct by reviewing your vacuuming strategy
Avoid related to subqueries wherever possible; they will considerably increase question value
Use EXISTS once checking for existence of rows supported criterion as a result of it “short-circuits” (stops process once it finds a minimum of one match)

4. General pointers

Do additional with less; computer hardware is quicker than I/O
Utilize Common Table Expressions and temporary tables once you have to be compelled to run bound queries
Avoid LOOP statements and like SET operations
Avoid COUNT(*) as PostgresSQL will table scans for this (versions <= nine.1 only)
Avoid ORDER BY, DISTINCT, GROUP BY, UNION once potential as a result of these cause high startup prices
Look for an outsized variance between calculable rows and actual rows within the justify statement. If the count is extremely totally different, the table statistics might be superannuated and PostgreSQL is estimating value exploitation inaccurate statistics. For example: Limit (cost=282.37..302.01 rows=93 width=22) (actual time=34.35..49.59 rows=2203 loops=1). The calculable row count was ninety three and therefore the actual was two,203. Therefore, it's seemingly creating a foul arrange call. you ought to review your vacuuming strategy and guarantee ANALYZE is being run ofttimes enough.

Sunday, August 13, 2017

Consulting Migration Tuning Script

Porting SQL Syntax, Function from Oracle to PostgreSQL

August 13, 2017

When We use PostgreSQL as a choice for subtitude/migrate Oracle, we will face there are difference syntax. Oracle and PostgreSQL both conform to standard SQL. However, they contain several extensions and implementation details that differentiate one from the other. The most important differences are listed in this table below.

4 Tools Database Conversion from Oracle to PostgreSQL

ORACLE	POSTGRESQL	Explain
select sysdate from dual	select ‘now’::datetime	There is no “dual” table Unlike other RDBMS, PostgreSQL allows a “select” without the ”from” clause. This use does not affect portability because the syntax to get current time is already DBMS specific
CREATE SEQUENCE seqname [ INCREMENT BY integer ] [ MINVALUE integer ] [ MAXVALUE integer ] [ START WITH integer ] [ CACHE integer ] [ CYCLE \| NOCYCLE ] To return the current value and increment the counter: sequence_name.nextval; Possible usage in a select statement: select sequence_name.nextval from dual;	CREATE SEQUENCE seqname [ INCREMENT increment ] [ MINVALUE minvalue ] [ MAXVALUE maxvalue ] [ START start ] [ CACHE cache ] [ CYCLE ] To return the current value and increment the counter: nextval(‘sequence_name’); Possible usage in a select statement select nextval(‘sequence_name’);	If you don’t specify MAXVALUE, then the maximum value is 2147483647 for ascending sequences. Note that unlike other RDBMS, PostgreSQL allows a select without the ‘from’ clause. This use does not affect portability because the sequence syntax is already DBMS specific
SELECT product_id, DECODE (warehouse_id, 1, ’Southlake’, 2, ’San Francisco’, 3, ’New Jersey’, 4, ’Seattle’, ’Non-domestic’) quantity_on_hand FROM inventories	SELECT a, CASE WHEN a=1 THEN 'one' WHEN a=2 THEN 'two' ELSE 'other' END FROM test	Converting DECODE to CASE WHEN
select employeeid, NVL(hire_date, sysdate) from employee where employeeid = 10;	select employeeid, coalesce(hire_date, 'now'::datetime) from employee where employeeid = 10;	Oracle also has a “coalesce” function that is a generalization of the commonly used NVL function

For next sintax will be continued...
Consulting

Thursday, August 10, 2017

Consulting Tools

4 Tools Database Conversion from Oracle to PostgreSQL

August 10, 2017

There are some external tools to help migrate Oracle database to PostgreSQL, paid or free. Here, 4 external tool that can you try.

Full Convert

Database conversion between Oracle (and 30+ other database engines) and PostgreSQL. With everything on localhost, typical throughput over 100k records per second.

Oracle to Postgres data migration and sync

Software is able to convert 1 Million of records in 4-5 minutes. Trigger-based database sync method and simultaneous Bi-directional synchronization help you to manage your data easy and efficiently.

ESF Database Migration Toolkit

A toolkit migrates Oracle databae to PostgreSQL in wizard. It connects to Oracle and PostgreSQL database directly, and migrate its table structure, data, indexes, primary keys, foreign keys, comments and so on.

ora2pg

Ora2Pg is a Perl module to export an Oracle database schema to a PostgreSQL compatible schema. It connects your Oracle database, extracts its structure, and generates an SQL script that you can load into your PostgreSQL database.

Wednesday, August 9, 2017

Consulting Tuning Query

How to Understand SQL Index and Where should I use an index

August 09, 2017

An index is used to speed up searching in the database or An index makes the query fast” is the most basic explanation of an index I have ever seen. Although it describes the most important aspect of an index very well, it is—unfortunately—not sufficient for this book.

OK, Let's Understand the SQL Index.

Searching in a database index is like searching in a printed telephone directory. The key concept is that all entries are arranged in a well-defined order. Finding data in an ordered data set is fast and easy because the sort order determines each entry's position.

A database index is, however, more complex than a printed directory because it undergoes constant change. Updating a printed directory for every change is impossible for the simple reason that there is no space between existing entries to add new ones. A printed directory bypasses this problem by only handling the accumulated updates with the next printing. An SQL database cannot wait that long. It must process insert, delete and update statements immediately, keeping the index order without moving large amounts of data.

Basic Configuration Parameter for Tuning Your PostgreSQL Server

Configuration parameter Tunables that to avoid

Tuning Case : Tuning postgresql for large amounts of ram

The database combines two data structures to meet the challenge: a doubly linked list and a search tree. These two structures explain most of the database's performance characteristics

And Where should I use an Index ?

An index can be used to efficiently find all row matching some column in your query and then walk through only that subset of the table to find exact matches. If you don't have indexes on any column in the WHERE clause, the SQL server have to walk through the whole table and check every row to see if it matches, which may be a slow operation on big tables.

The index can also be a UNIQUE index, which means that you cannot have duplicate values in that column, or a PRIMARY KEY which in some storage engines defines where in the database file the value is stored.

In POSTGRES you can use EXPLAIN in front of your SELECT statement to see if your query will make use of any index. This is a good start for troubleshooting performance problems.

#

Saturday, July 8, 2017

Server Configuration

log_statement and og_min_duration_statement Parameter

July 08, 2017

The log_statement Parameter POSTGRESQL options for this setting are as follows:

none: Do not log any statement-level information.
ddl: Log only Data Definition Language (DDL) statements such as CREATE and DROP. This can normally be left on even in production, and is handy to catch major changes introduced accidentally or intentionally by administrators.
mod: Log any statement that modifies a value, which is essentially everything except for simple SELECT statements. If your workload is mostly SELECT based with relatively few data changes, this may be practical to leave enabled all the time.
all: Log every statement. This is generally impractical to leave on in production due to the overhead of the logging. However, if your server is powerful enough relative to its workload, it may be practical to keep it on all the time.

Statement logging is a powerful technique for finding performance issues. Analyzing the information saved by log_statement and related sources for statement-level detail can reveal the true source for many types of performance issues. You will need to combine this with appropriate analysis tools.

log_min_duration_statement Parameter

Once you have some idea of how long a typical query statement postgreSQL Tuning should take to execute, this setting allows you to log only the ones that exceed some threshold you set. The value is in milliseconds, so you might set:

log_min_duration_statement=1000

And then you'll only see statements that take longer than one second to run. This can be extremely handy for finding out the source of "outlier" statements that take much longer than most to execute.

If you are running 8.4 or later, you might instead prefer to use the auto_explainmodule: http://www.postgresql.org/docs/8.4/static/auto-explain.html instead of this feature. This will allow you to actually see why the queries that are running slowly are doing so by viewing their associated EXPLAIN plans.

Monday, July 3, 2017

Server Configuration

Autovacuum and Vacuuming and statistics Parameter

July 03, 2017

As both these tasks are critical to database performance over the long-term, starting in PostgreSQL 8.1 there is an autovacuum daemon available that will run in the background to handle these tasks for you. Its action is triggered by the number of changes to the database exceeding a threshold it calculates based on the existing table size.

The parameter for autovacuum is turned on by default in PostgreSQL 8.3, and the default settings are generally aggressive enough to work out of the box for smaller database with little manual tuning. Generally you just need to be careful that the amount of data in the free space map doesn't exceed max_fsm_pages, and even that requirement is automated away from being a concern as of 8.4.

Vacuuming and statistics Parameter

PostgreSQL databases require two primary forms of regular maintenance as data is added, updated, and deleted.

VACUUM cleans up after old transactions, including removing information that is no longer visible and returning freed space to where it can be re-used. The more often you UPDATE and DELETE information from the database, the more likely you'll need a regular vacuum cleaning regime. However, even static tables with data that never changes once inserted still need occasional care here.

ANALYZE looks at tables in the database and collects statistics about them— information like estimates of how many rows they have and how many distinct values are in there. Many aspects of query planning depend on this statistics data being accurate.

Enabling autovacuum on older versions Parameter

If you have autovacuum available but it's not turned on by default, which will be the case with PostgreSQL 8.1 and 8.2, there are a few related parameters that must also be enabled for it to work, as covered in http://www.postgresql.org/docs/8.1/interactive/maintenance.html or http://www.postgresql.org/docs/8.2/interactive/routine-vacuuming.html.

The normal trio to enable in the postgresql.conf file in these versions are:

stats_start_collector=true
stats_row_level=true
autovacuum=on

Note that as warned in the documentation, it's also wise to consider adjusting superuser_reserved_connections to allow for the autovacuum processes in these earlier versions.

The autovacuum you'll get in 8.1 and 8.2 is not going to be as efficient as what comes in 8.3 and later. You can expect it to take some fine tuning to get the right balance of enough maintenance without too much overhead, and because there's only a single worker it's easier for it to fall behind on a busy server. This topic isn't covered at length here. It's generally a better idea to put time into planning an upgrade to a PostgreSQL version with a newer autovacuum than to try and tweak an old one extensively, particularly if there are so many other performance issues that cannot be resolved easily in the older versions, too.

Thursday, June 29, 2017

Server Configuration

work_mem and maintainance_work_mem Parameter

June 29, 2017

work_mem Parameter

When a query is running that needs to sort data, the database estimates how much data is involved and then compares it to the work_mem parameter. If it's larger (and the default is only 1 MB), rather than sorting in memory it will write all the data out and use a disk-based sort instead. This is much, much slower than a memory based one. Accordingly, if you regularly sort data, and have memory to spare, a large increase in work_mem can be one of the most effective ways to speed up your server. A data warehousing report might on a giant server run with a gigabyte of work_mem for its larger reports.

The catch is that you can't necessarily predict the number of sorts any one client will be doing, and work_mem is a per-sort parameter rather than a per-client one. This means that memory use via work_mem is theoretically unbounded, where a number of clients sorting large enough things to happen concurrently.

In practice, there aren't that many sorts going on in a typical query, usually only one or two. And not every client that's active will be sorting at the same time. The normal guidance for work_mem is to consider how much free RAM is around after shared_buffers is allocated (the same OS caching size figure needed to compute effective_cache_size), divide by max_connections, and then take a fraction of that figure; a half of that would be an aggressive work_mem value. In that case, only if every client had two sorts active all at the same time would the server be likely to run out of memory, which is an unlikely scenario.

The work_mem computation is increasingly used in later PostgreSQL versions for estimating whether hash structures can be built in memory. Its use as a client, memory size threshold is not limited just to sorts. That's simply the easiest way to talk about the type of memory allocation decision it helps to guide.

Like synchronous_commit, work_mem can also be set per-client. This allows an approach where you keep the default to a moderate value, and only increase sort memory for the clients that you know are running large reports

maintainance_work_mem Parameter

A few operations in the database server need working memory for larger operations than just regular sorting. VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY all can allocate up to maintainance_work_mem worth of memory instead. As it's unlikely that many sessions will be doing one of these operations at once, it's possible to set this value much higher than the standard per-client work_mem setting. Note that at least autovacuum_max_workers (defaulting to 3 starting in version 8.3) will allocate this much memory, so consider those sessions (perhaps along with a session or two doing a CREATE INDEX) when setting this value.

Assuming you haven't increased the number of autovacuum workers, a typical high setting for this value on a modern server would be at five percent of the total RAM, so that even five such processes wouldn't exceed a quarter of available memory. This works out to approximately 50 MB of maintainance_work_mem per GB of server RAM.

Wednesday, June 28, 2017

Server Configuration

default_statistics_target Parameter

June 28, 2017

PostgreSQL makes its decisions about how queries execute based on statistics collected about each table in your database. This information is collected by analyzing the tables, either with the ANALYZE statement or via autovacuum doing that step. In either case, the amount of information collected during the analyze step is set by default_statistics_target. Increasing this value makes analysis take longer, and as analysis of autovacuum happens regularly this turns into increased background overhead for database maintenance. But if there aren't enough statistics about a table, you can get bad plans for queries against it.

The default value for this setting used to be the very low (that is,10), but was increased to 100 in PostgreSQL 8.4. Using that larger value was popular in earlier versions, too, for general improved query behavior. Indexes using the LIKE operator tended to work much better with values greater than 100 rather than below it, due to a hard-coded change at that threshold.

Note that increasing this value does result in a net slowdown on your system if you're not ever running queries where the additional statistics result in a change to a better query plan. This is one reason why some simple benchmarks show PostgreSQL 8.4 as slightly slower than 8.3 at default parameters for each, and in some cases you might return an 8.4 install to a smaller setting. Extremely large settings for default_statistics_target are discouraged due to the large overhead they incur.

If there is just a particular column in a table you know that needs better statistics, you can use ALTER TABLE SET STATISTICS on that column to adjust this setting just for it. This works better than increasing the system-wide default and making every table pay for that requirement. Typically, the columns that really require a lot more statistics to work properly will require a setting near the maximum of 1000 (increased to 10,000 in later versions) to get a serious behavior change, which is far higher than you'd want to collect data for on every table in the database

Tuesday, June 27, 2017

Server Configuration

CHECKPOINT Paramater

June 27, 2017

There are more checkpoint parameter for PostgreSQL tuning paramater, such as:

checkpoint_segments Parameter Checkpoints

Each WAL segment takes up 16 MB. As described at http://www.postgresql.org/docs/current/interactive/wal-configuration.html the maximum number of segments you can expect to be in use at any time is:

(2 + checkpoint_completion_target) * checkpoint_segments + 1

Note that in PostgreSQL versions before 8.3 that do not have spread checkpoints, you can still use this formula, just substitute the following code snippet for the value you'll be missing:

checkpoint_completion_target=0

The easiest way to think about the result is in terms of the total size of all the WAL segments that you can expect to see on disk, which has both a disk cost and serves as something that can be used to estimate the time for recovery after a database crash. The expected peak pg_xlog size grows as shown in the following table:

checkpoint_segments	checkpoint_completion_target=0	target=0.5	target=0.9
3	112MB	144MB	160MB
10	336MB	416MB	480MB
32	1040MB	1296MB	1504MB
64	2064MB	2576MB	2992MB
128	4112MB	5136MB	5968MB
256	8208MB	10256MB	11904MB

The general rule of thumb you can extract here is that for every 32 checkpoint segments, expect at least 1 GB of WAL files to accumulate. As database crash recovery can take quite a while to process even that much data, 32 is as high as you want to make this setting for anything but a serious database server. The default of 3 is very low for most systems though; even a small install should consider an increase to at least 10.

Normally, you'll only want a value greater than 32 on a smaller server when doing bulk-loading, where it can help performance significantly and crash recovery isn't important. Databases that routinely do bulk loads may need a higher setting.

checkpoint_timeout Parameter Checkpoint

checkpoint_timeout is value of Parameter Checkpoint. The default for this setting of 5 minutes is fine for most installations. If your system isn't able to keep up with writes and you've already increased checkpoint_segments to where the timeout is the main thing driving when checkpoints happen, it's reasonable to consider an increase to this value.

Aiming for 10 minutes or more between checkpoints isn't dangerous; again it just increases how long database recovery after a crash will take. As this is one component to database server downtime after a crash, that's something you need a healthy respect for.

checkpoint_completion_target is value of Parameter Checkpoint.

If you have increased checkpoint_segments to at least 10, it's reasonable at that point to also increase checkpoint_competion_target to its practical maximum of 0.9. This gives maximum checkpoint spreading, which theoretically means the smoothest I/O, too. In some cases keeping the default of 0.5 will still be better however, as it makes it less likely that one checkpoint's writes will spill into the next one.

It's unlikely that a value below 0.5 will be very effective at spreading checkpoints at all. Moreover, unless you have an extremely large value for the number of segments the practical difference between small changes in its value are unlikely to matter. One approach for the really thorough is to try both 0.5 and 0.9 with your application and see which one gives the smoother disk I/O curve over time, as judged by OS-level monitoring.

Saturday, June 24, 2017

Parameter Server Configuration

wal_sync_method and wal_buffers Parameter WAL settings

June 24, 2017

In this article We will discuss about wal_sync_method and wal_buffers for PostgreSQL Tuning Paramater. OK, let's take down...

One purpose of wal_sync_method is to tune such caching behavior.

The default behavior here is somewhat different from most of the options. When the server source code is compiled, a series of possible ways to write are considered. The one believed most efficient then becomes the compiled-in default. This value is not written to the postgresql.conf file at initdb time though, making it different from other auto-detected, platform-specific values such as shared_buffers.

Before adjusting anything, you should check what your platform detected as the fastest safe method using SHOW; the following is a Linux example:

postgres=# show wal_sync_method;
 wal_sync_method
-----------------
 fdatasync

On both Windows and the Mac OS X platforms, there is a special setting to make sure the OS clears any write-back caches. The safe value to use on these platforms that turns on this behavior is as follows:

wal_sync_method=fsync_writethrough

If you have this setting available to you, you really want to use it! It does exactly the right thing to make database writes safe, while not slowing down other applications the way disabling an entire hard drive write cache will do.

This setting will not work on all platforms however. Note that you will see a performance drop going from the default to this value, as is always the case when going from unsafe to reliable caching behavior.

On other platforms, tuning wal_sync_method can be much more complicated. It's theoretically possible to improve write throughput on any UNIX-like system by switching from any write method that uses a write/fsync or write/fdatasync pair to using a true synchronous write. On platforms that support safe DSYNC write behavior, you may already see this as your default when checking it with SHOW:

wal_sync_method=open_datasync

Even though you won't see it explicitly listed in the configuration file as such. If this is the case on your platform, there's little optimization beyond that you can likely perform. open_datasync is generally the optimal approach, and when available it can even use direct I/O as well to bypass the operating system cache.

The Linux situation is perhaps the most complicated. As shown in the last code, this platform will default to fdatasync as the method used. It is possible to switch this to use synchronous writes with:

wal_sync_method=open_sync

Also, in many cases you can discover this is faster—sometimes much faster—than the default behavior. However, whether this is safe or not depends on your filesystem. The default filesystem on most Linux systems, ext3, does not handle O_SYNC writes safely in many cases, which can result in corruption. See "PANIC caused by open_sync on Linux" at http://archives.postgresql.org/pgsqlhackers/2007-10/msg01310.php for an example of how dangerous this setting can be on that platform. There is evidence that this particular area has fi nally been cleaned up on recent (2.6.32) kernels when using the ext4 filesystem instead, but this has not been tested extensively at the database level yet.

In any case, your own tests of wal_sync_method should include the "pull the cord" test, where you power the server off unexpectedly, to make sure you don't lose any data with the method you've used. Testing at a very high load for a long period of time is also advisable, to find intermittent bugs that might cause a crash.

wal_buffers Parameter WAL settings

While the documentation on wal_buffers suggests that the default of 64 KB is sufficient as long as no single transaction exceeds that value, in practice write-heavy benchmarks see optimal performance at higher values than you might expect from that, at least 1 MB or more. With the only downside being the increased use of shared memory, and as there's no case where more than a single WAL segment could need to be buffered, given modern server memory sizes the normal thing to do nowadays is to just set:

wal_buffers=16MB

Then forget about it as a potential bottleneck or item to tune further. Only if you're tight on memory should you consider a smaller setting.

Thursday, June 22, 2017

Parameter Server Configuration

effective_cache_size and per-client settings Parameter

June 22, 2017

PostgreSQL is expected to have both its own dedicated memory (shared_buffers) as well as utilize the filesystem cache. In some cases, when making decisions like whether it is efficient to use an index or not, the database compares sizes it computes against the effective sum of all these caches; that's what it expects to find in effective_cache_size.

The same rough rule of thumb that would put shared_buffers at 25 percent of system memory would set effective_cache_size to between 50 and 75 percent of RAM. To get a more accurate estimate, first observe the size of the filesystem cache:

UNIX-like systems: Add the free and cached numbers shown by the free or top commands to estimate the filesystem cache size
Windows: Use the Windows Task Manager's Performance tab and look at the System Cache size

Assuming you have already started the database, you need to then add the shared_buffers figure to this value to arrive at a figure for effective_cache_size. If the database hasn't been started yet, usually the OS cache will be an accurate enough estimate, when it's not running. Once it is started, most of the database's dedicated memory will usually be allocated to its buffer cache anyway.

effective_cache_size does not allocate any memory. It's strictly used as input on how queries are executed, and a rough estimate is sufficient for most purposes. However, if you set this value much too high, actually executing the resulting queries may result in both the database and OS cache being disrupted by reading in the large number of blocks required to satisfy the query believed to fit easily in RAM.

It's rare you'll ever see this parameter tuned on a per-client basis, even though it is possible.

Per-client settings Parameter

While all of the settings in this section can be adjusted per client, you'll still want good starting settings for these parameters in the main configuration file. Individual clients that need values outside the standard can always do so using the SET command within their session.

Wednesday, June 21, 2017

Parameter Server Configuration

synchronous_commit Parameter

June 21, 2017

The overhead of waiting for physical disk commits was stressed as a likely bottleneck for committing transactions. If you don't have a battery-backed write cache to accelerate that, but you need better commit speed, what can you do? The standard approach is to disable synchronous_commit, which is sometimes alternately referred to as enabling asynchronous commits. This groups commits into chunks at a frequency determined by the related wal_writer_delay parameter. The default settings guarantee a real commit to disk at most 600 milliseconds after the client commit. During that window, which you can reduce in size with a corresponding decrease in speed-up, that data will not be recovered afterwards if your server crashes.

Note that it's possible to turn this parameter off for a single client during its session rather than making it a server-wide choice:

SET LOCAL synchronous_commit TO OFF;

This provides you with the option of having different physical commit guarantees for different types of data you put into the database. A routine activity monitoring table, one that was frequently inserted into and where a fraction of a second of loss is acceptable, would be a good candidate for asynchronous commit. An infrequently written table holding real-world monetary transactions should prefer the standard synchronous commit.

Monday, June 19, 2017

Parameter Server Configuration

random_page_cost and constraint_exclusion Parameter

June 19, 2017

random_page_cost Parameter

This parameter is common to tune, but explaining what it does requires a lot of background about how queries are planned. Particularly in earlier PostgreSQL versions, lowering this value from its default—for example, a reduction from 4.0 to 2.0—was a common technique. It was used for making it more likely that the planner would use indexed queries instead of the alternative of a sequential scan. With the smarter planner in current versions, this is certainly not where you want to start tuning at. You should prefer getting better statistics and setting the memory parameters as primary ways to influence the query planner.

constraint_exclusion Parameter

If you are using PostgreSQL 8.3 or earlier versions, and you are using the database's table inheritance feature to partition your data, you'll need to turn this parameter on.

Starting in 8.4, constraint_exclusion defaults to a new smarter setting named partition that will do the right thing in most situations without it ever needing to be adjusted.

Saturday, June 17, 2017

Parameter Server Configuration

Configuration parameter Tunables that to avoid

June 17, 2017

There are a few parameters in the postgesql.conf that have gathered up poor guidance in other guides you might come across, and they might already be set badly in a server whose configuration you're now responsible for. Others have names suggesting a use for the parameter that actually doesn't exist. This section warns you about the most common of those to avoid adjusting.

Some Parameter Tunable to avoid are:

fsync

full_page_writes

commit_delay and commit_siblings

max_prepared_transactions

Query enable parameters

Friday, June 16, 2017

Parameter Server Configuration

fsync and max_prepared_transactions Parameter

June 16, 2017

If you just want to ignore crash recovery altogether, you can do that by turning off the fsync parameter. This makes the value for wal_sync_method irrelevant, because the server won't be doing any WAL sync calls anymore.

It is important to recognize that if you have any sort of server crash when fsync is disabled, it is likely your database will be corrupted and no longer start afterwards. Despite this being a terrible situation to be running a database under, the performance speedup of turning crash recovery off is so large that you might come across suggestions you disable fsync anyway. You should be equally hesitant to trust any other advice you receive from sources suggesting this, as it is an unambiguously dangerous setting to disable.

One reason this idea gained traction is that in earlier PostgreSQL versions, there was no way to reduce the number of fsync calls to a lower number—to trade-off some amount of reliability for performance. Starting 8.3, in most cases where people used to disable fsync it's a better idea to turn off synchronous_commit instead.

There is one case where fsync=off may still make sense: initial bulk loading. If you're inserting a very large amount of data into the database, and do not have hardware with a battery-backed write cache, you might discover this takes far too long to ever be practical. In this case, turning the parameter off during the load— where all data can easily be recreated if there is a crash causing corruption—may be the only way to get loading time below your target. Once your server is back up again, you should turn it right back on again.

Some systems will also turn off fsync on servers with redundant copies of the database— for example, slaves used for reporting purposes. These can always resynchronize against the master if their data gets corrupted.

max_prepared_transactions Parameter

Many people see this name and assume that as they use prepared statements, a common technique to avoid SQL injection, that they need to increase this value. This is not the case; the two are not related. A prepared transaction is one that uses PREPARE TRANSACTION for two-phase commit (2PC).

If you're not specifically using that command and 2PC, you can leave this value at its default. If you are using those features, only then will you likely need to increase it to match the number of connections

Wednesday, June 14, 2017

Parameter Server Configuration

commit_delay and commit_siblings Parameter

June 14, 2017

Before synchronous_commit was implemented, there was an earlier attempt to add that sort of feature enabled by the commit_delay and commit_siblings parameters. These are not effective parameters to tune in most cases. It is extremely difficult to show any speedup by adjusting them, and quite easy to slow every transaction down by tweaking them.

The only case where they have shown some value is for extremely high I/O rate systems. Increasing the delay to a very small amount can make writes happen in bigger blocks, which sometimes turn out better aligned when combined with larger RAID stripe sizes in particular.

Tuesday, June 13, 2017

Parameter Server Configuration Tuning Query Tuning Script

Query enable parameters and Logging Parameter

June 13, 2017

It's possible to disable many of the query planner's techniques, in hopes of avoiding a known bad type of query. This is sometimes used as a work-around for the fact that PostgreSQL doesn't support direct optimizer hints for how to execute a query.

You might see the following code snippet, suggested as a way to force use of indexes instead of sequential scans for example:

enable_seqscan = off

Generally this is a bad idea, and you should improve the information the query optimizer is working with so it makes the right decisions instead.

Logging Parameter

General logging setup is important but it is somewhat outside the scope of this article. You may need to set parameters such as log_destination, log_directory, and log_filename to save your log files in a way compatible with the system administrations requirements of your environment. These will all be set to reasonable defaults to get started with on most systems.

On UNIX-like systems, it's common for some of the database logging to be set in the script that starts and stops the server, rather than directly in the postgresql.conf file. If you instead use the pg_ctlcommand to manually start the server, you may discover that logging ends up on your screen instead. You'll need to look at the script that starts the server normally (commonly /etc/init.d/postgresql) to determine what it does, if you want to duplicate that behavior. In most cases, you just need to add –l logfilename to the pg_ctl command line to redirect its output to the standard location.

Tuesday, June 6, 2017

Parameter Server Server Configuration

log_line_prefix, full_page_writes, Logging Parameter

June 06, 2017

Logging

General logging setup is important but it is somewhat outside the scope of this article. You may need to set parameters such as log_destination, log_directory, and log_filename to save your log files in a way compatible with the system administrations requirements of your environment. These will all be set to reasonable defaults to get started with on most systems.
On UNIX-like systems, it's common for some of the database logging to be set in the script that starts and stops the server, rather than directly in the postgresql.conf file. If you instead use the pg_ctl command to manually start the server, you may discover that logging ends up on your screen instead. You'll need to look at the script that starts the server normally (commonly /etc/init.d/postgresql) to determine what it does, if you want to duplicate that behavior. In most cases, you just need to add –l logfilename to the pg_ctl command line to redirect its output to the standard location.

log_line_prefix

The default log_line_prefix is empty, which is not what you want. A good starting value here is the following:

log_line_prefix='%t:%r:%u@%d:[%p]: '

This will put the following into every log line:

%t: Timestamp
%u: Database user name
%r: Remote host connection is from
%d: Database connection is to
%p: Process ID of connection

It may not be obvious what you'd want all of these values for initially, particularly, the process ID. Once you've tried to chase down a few performance issues, the need for saving these values will be more obvious, and you'll be glad to already have this data logged.
Another approach worth considering is setting log_line_prefix such that the resulting logs will be compatible with the pgFouine program. That is a reasonable, general purpose logging prefix, and many sites end up needing to do some sort of query analysis eventually.

full_page_writes Parameter

Much like fsync, turning this parameter off increases the odds of database corruption in return for an increase in performance.

You should only consider adjusting this parameter if you're doing extensive researching into your filesystem and hardware, in order to assure partial page writes do not happen.

Featured

Thursday, April 19, 2018

Thursday, September 14, 2017

Sunday, August 13, 2017

Thursday, August 10, 2017

Wednesday, August 9, 2017

Saturday, July 8, 2017

log_min_duration_statement Parameter

Monday, July 3, 2017

Enabling autovacuum on older versions Parameter

Thursday, June 29, 2017

work_mem Parameter

maintainance_work_mem Parameter

Wednesday, June 28, 2017

Tuesday, June 27, 2017

checkpoint_segments Parameter Checkpoints

checkpoint_timeout Parameter Checkpoint

checkpoint_completion_target is value of Parameter Checkpoint.

Saturday, June 24, 2017

wal_buffers Parameter WAL settings

Thursday, June 22, 2017

Per-client settings Parameter

Wednesday, June 21, 2017

Monday, June 19, 2017

random_page_cost Parameter

constraint_exclusion Parameter

Saturday, June 17, 2017

Friday, June 16, 2017

Wednesday, June 14, 2017

Tuesday, June 13, 2017

Tuesday, June 6, 2017

Logging

log_line_prefix