Recently at work I was tasked with improving our legacy application. It has been neglected for a while, and takes its revenge by causing frequent firefighting and overall crappy performance. The application is tightly coupled with a PostgreSQL database, and many things that are normally not the job of a database (such as keeping version history) are delegated to this single PostgreSQL instance. The result is a feedback loop where the database is under immense load for even the simplest things, causing frequent deadlocks and extremely long queries, which leads to decreased performance and long request times, which leads to even more load. To put an end to this spiral of endless firefighting, and improve my knowledge of Postgres, I decided to spend some time with the legacy application. The first step was analyzing the database performance, to find out whether there is anything that would give us the biggest advantage with comparably small effort.
Generally speaking, the following are the factors that we need to focus on to judge how well a database cluster is performing:
Index usage: The most important algorithmic fundamental of a relational database is the B-tree index. If a database is not properly configured, it will do sequential scans across frequently used tables (linear with table size) instead of using an index (logarithmic).
IO: PostgreSQL does its best not to read data from disk, either delaying reading as much as possible, or using the cache. Whether reading disk can be avoided depends mostly on cache configuration.
Concurrent connections: Many parallel connections consume a lot of memory and CPU. You should make sure that your database is not plagued with more connections than it can handle.
Deadlocks: These nasty buggers are the biggest killers in terms of performance, because they lead to long queries, blocked connections, and expensive transaction rollbacks. If you have a lot of deadlocks, your locking queries need a review.
Collecting General Performance Data
Not surprisingly, there are a number of tables that PostgreSQL keeps within its own schema with an abundance of information on the above dimensions. These tables all start with either
pg_statio, and are generally referred to as the stats tables. It is important to keep in mind that there are two kind of statistics in PostgreSQL. The first kind is for its own internal usage, such as deciding when to run
autovacuum, and query planning. This data is kept in the
pg_statistics catalog. As the documentation points out, this table should not be readable to ordinary users. A publicly readable view on this data that is also in a more human-friendly format is
The second kind of statistics is for monitoring, and these tables are the focus of this post. The monitoring stats tables can be subsumed in three groups: Database-specific, table-specific and query-specific. Let's start with database-specific statistics. The statistics for a single database are saved in the
pg_stat_database table. In addition to the rows that are to be expected, such as database name and id (
datid), the following columns that are relevant to our interests are in this table:
numbackends: Number of backends currently connected to this database.
blks_hit: Number of times disk blocks were read vs. number of cache hits for these blocks.
xact_rollback: Number of transactions committed and rolled back, respectively.
deadlocks: Number of deadlocks since last reset. As mentioned above, very important for database performance.
numbackends is an important column, not only because too high a value can cause issues, as mentioned above, but also because the change in this number during normal operation gives us a hint about how long queries are taking. Combining the value of
numbackends with the oldest running query from the
pg_stat_activity table might also be informative, to make sure that there are no long-running connections that were not properly closed.
The ratio of cache hits to total reads can be determined with the following query:
SELECT blks_hit::float/(blks_read + blks_hit) as cache_hit_ratio FROM pg_stat_database WHERE datname=current_database();
This number is the most important metric for measuring IO performance; it should be very close to 1. Otherwise you should consider changing the
shared_buffers configuration option. A similar ratio of the number of committed transactions vs. all transactions is also important:
SELECT xact_commit::float/(xact_commit + xact_rollback) as successful_xact_ratio FROM pg_stat_database WHERE datname=current_database();
numbackends, all these values are accumulated since the time they were reset. Resetting can be carried out by logging into the database and running
select pg_stat_reset();. The last time this was done is stored in the
stats_reset column. Resetting statistics affects only the monitoring tables;
pg_statistics is populated by
ANALYZE, and is not affected.
The most useful table-specific stats table is
pg_stat_all_tables. Running a simple
\d pg_stat_all_tables on this table reveals some very interesting columns:
last_analyze: The last time vacuum and analyze have been executed manually on this table.
last_autoanalyze: The last time this table has been vacuumed or analyzed by the autovacuum daemon.
idx_tup_fetch: The number of times an index scan was made on this table, and the number of rows fetched this way.
seq_tup_read: The number of times a sequential scan was made, and the number of rows read this way.
n_tup_del: Number of rows inserted, updated and deleted.
n_dead_tup: Estimated number of live rows vs. dead rows.
The most meaningful stats from a performance perspective are those related to index vs sequential scans. An index scan happens when the database can determine which rows to fetch by ID only using an index, a data structure that is easy to traverse. A sequential scan happens, on the other hand, when a table has to be linearly processed in order to determine which rows belong in a set. Sequential scans are very costly operations for big tables. The reason for this is that reading rows is an expensive operation, as the actual table data is stored in an unordered heap. The aim of a database user therefore should be to tweak the index definitions so that the database does as little sequential scans as possible. I strongly recommend the book SQL Performance Explained on the topic of indexes. The ratio of index scans to all scans for the whole database can be calculated as follows:
SELECT sum(idx_scan)/(sum(idx_scan) + sum(seq_scan)) as idx_scan_ratio FROM pg_stat_all_tables WHERE schemaname='public';
The user has access to the tables in the current database plus some other system tables, such as the TOAST tables, which necessitates filtering these out by looking at only those in the
public namespace. The value returned by the above query should be very close to 1, otherwise you have a serious problem. In order to see a more detailed report of how individual tables are faring in the same area, you can use the following query, which calculates the same ratio per table and puts them in ascending order:
SELECT relname,idx_scan::float/(idx_scan+seq_scan+1) as idx_scan_ratio FROM pg_stat_all_tables WHERE schemaname='public' ORDER BY idx_scan_ratio ASC;
As pointed out in this blog post, it might be a good idea to pay special attention to index usage on tables with many rows, and make sure they are as highly optimized as possible.
Running the query
select pg_stat_reset(); as superuser resets also
pg_stat_all_tables as well as
One question we had in mind was how the stats were related to queries running within trigger functions. PostgreSQL is known for doing the sensible thing, so we expected stats to be collected also within triggers, but it's best to make sure by running a simple test. Let's create an empty database with the following simple tables:
CREATE TABLE person ( id SERIAL PRIMARY KEY, last_name VARCHAR(255), first_name VARCHAR(255) ); CREATE TABLE address ( id SERIAL PRIMARY KEY, person_id integer REFERENCES person(id), fullname VARCHAR(255), street VARCHAR(255), city VARCHAR(255) );
We can insert the following rows into the
INSERT INTO person (first_name, last_name) VALUES ('Hercule', 'Poirot'); INSERT INTO address (person_id, fullname, street, city) VALUES (1, 'Hercule Poirot', 'Rue des Martyrs', 'Paris');
A quick check of the pgstatalltables after resetting stats reveals the following:
SELECT idx_scan,seq_scan,n_tup_ins FROM pg_stat_all_tables WHERE schemaname='public' AND relname='person'; SELECT * from person where first_name='Hercule'; SELECT idx_scan,seq_scan,n_tup_ins FROM pg_stat_all_tables WHERE schemaname='public' AND relname='person';
SELECT query on
0, 0, 0, whereas the second one returns
0, 1, 0, as one would expect. In order to test whether these statistics take into account triggers, we can add a trigger to the
person table with the following lines:
CREATE OR REPLACE FUNCTION update_fullname() RETURNS TRIGGER AS $$ BEGIN UPDATE address SET fullname = concat(NEW.first_name, ' ', NEW.last_name) WHERE person_id = NEW.id; RETURN NULL; END; $$ LANGUAGE plpgsql; DROP TRIGGER IF EXISTS update_fullname_trigger ON person; CREATE TRIGGER update_fullname_trigger AFTER UPDATE ON person FOR EACH ROW EXECUTE PROCEDURE update_fullname();
After installing the
update_fullname trigger, which changes the fullname column in the
address table when a
person changes, we can reset the statistics and run a simple update to see what happens:
SELECT pg_stat_reset(); UPDATE person SET first_name = 'Marcel' WHERE id=1; SELECT idx_scan,seq_scan,n_tup_upd FROM pg_stat_all_tables WHERE schemaname='public' AND relname='address';
This should return
0, 1, 1, meaning that the query ran by the trigger was registered in the statistics.
Monitoring Query Performance
The tables mentioned until now give you a general overview of the performance characteristics of your database. When it comes to finding the reasons for these characteristics, you need to go one level deeper, to individual queries. The one table that has the most information on the performance of individual queries is
pg_stat_statements. Unfortunately, this table is populated by a plugin that has to be first enabled, requiring a database restart. I would strongly encourage you to install the plugin though, since the data registered by it is impossible to derive or collect otherwise. Enabling the plugin is a matter of installing the package
postgresql-contrib-9.X for your version of PostgreSQL and Unix, and adding (or uncommenting) the following lines in
shared_preload_libraries = 'pg_stat_statements' pg_stat_statements.track = all
Afterwards, you should log in to the database of interest and run
CREATE EXTENSION pg_stat_statements;. From now on, various statistics will be collected for each individual query, and stored in the
pg_stat_statements table. The important identifier columns on this table are the following:
dbid: This column has the ID of the database on which the query was ran. The corresponding column in the
pg_databasetable is called
oid, and is hidden. You normally don't have to filter for this column, though; only the queries for the currently connected database are visible in the
queryid: This is a hash of the internal representation of the query. The way this hash is calculated involves a number of subtleties. These will be discussed a few lines below.
query: A representative text for what PostgreSQL considers to be the same query.
Query hash generation takes as its input the representation that PostgreSQL generates after a query is parsed and matched to the relevant tables or indexes. The scalar values in the query are then stripped out for plannable queries, i.e.
SELECT, INSERT, UPDATE, DELETE. The resulting internal representation is an abstract "summary" of the query. Different queries can thus match to the same
queryid, for example in the cases where the order of the select fields or the join order is different. See the PostgreSQL documentation on the topic for further details.
The columns in the
pg_stat_statements table relevant for performance analysis are the following:
calls: Number of times executed
total_time: Time spent in this query
mean_time: The min, max and mean of all query runs.
As with the above statistics tables,
pg_stat_statements aggregates values between resets. This table requires a different function to reset, the aptly named
A simple test shows that the queries ran through triggers are accounted for in the
pg_stat_statements table, too. After creating the tables, registering the triggers, and resetting the statistics with
SELECT pg_stat_statements_reset(), let's run the following simple query again:
UPDATE person SET first_name = 'Marcel' WHERE id=1;
Asking for the statistics shows us that the
UPDATE statements in the trigger have been registered properly:
test=# select calls,total_time,left(query,30) from pg_stat_statements where dbid=874591 order by calls desc; calls | total_time | left -------+------------+-------------------------------- 2 | 0.201 | select calls,total_time,left(q 1 | 0.019 | UPDATE address + | | SET f 1 | 8.898 | select pg_stat_statements_rese 1 | 0.564 | UPDATE person SET first_name =
pg_stat_statements extension is enabled, improving database performance in terms of query duration (the most important thing, as far as the users are concerned) is as simple as finding the longest-running queries ordered either by average or total time, finding sample values for the parameters, and running them with
EXPLAIN ANALYZE. See this old but still relevant tutorial for a quick introduction to using
One more thing we wanted to achieve was to regularly query our database instance for the above mentioned pieces of information, and display them on our Kibana dashboard. Unfortunately, Logstash proved to be a roadblock with its weird parsing behavior and incomprehensible bugs (hence my current attempt to rewrite it in Python), but for the time being, here is a bash script which uses psql to query PostgreSQL for the stats tables, and pipes everything to syslog:
set -e case "$1" in database) psql -x db_name -c "select numbackends,blks_hit::float/(blks_read + blks_hit) as cache_hit_ratio,xact_commit::float/(xact_commit + xact_rollback) as successful_xact_ratio from pg_stat_database where datname=db_name;" | grep -v RECORD | sed '/^$/d' | tr '\n' ' ' | logger ;; statements) psql -x db_name -c "select queryid, total_time, (total_time::float/calls) as mean_time, left(query,40) as short_query from pg_stat_statements order by total_time desc limit 10;" | tr '\n' ' ' | sed 's/-\[ RECORD [0-9]* \]-*/\n/g' | xargs -d '\n' -n 1 logger ;; *) exit 1 ;; esac