PostgreSQL Vacuuming: An Introduction for Busy Devs

If you have interacted with PostgreSQL at any point in your developer career, you have met it: The autovacuum daemon. It fires up every now and then, consumes resources, and disappears again, without telling you what it did, and why it ran in the first place. In this post, I would like to give an idea of what vacuuming is, what the autovacuum daemon does, and how you can become friends with it.

What is vacuuming?

The concept of vacuuming has to do with the way PostgreSQL implements certain RDBMS features. A modern RDBMS has to offer concurrency control for transactions. That is, different transactions have to be able to see different views of the data, depending on which statements they have already executed. This concept is called transaction isolation, and constitutes the I in ACID. Some rows might be edited by a transaction, changing certain fields, whereas others might be deleted in one while they are still available in others. Furthermore, each transaction can be rolled back, leading to undoing of the changes made by the transaction. The management of data state is complicated by the fact that an RDBMS has to keep the storage of the table and any indexes on a table intact while managing data visibility. It cannot just go ahead and modify data on the primary data structures; this would lead to an invalid state.

The solution implemented by PostgreSQL is called Multi-version Concurrency Control (MVCC). The basic idea is to mark rows according to the transactions which are affecting them, and manage visibility accordingly. Each transaction gets an ID from a simple 32 bit integer sequence. Rows are then marked with this ID regarding which transaction last modified or deleted them. These marks are stored in the xmin and xmax columns which are normally hidden, but visible if explicitly queried for. Using the sample tables from the previous post on PostgreSQL performance, we can insert some data, and then see the transaction ID's:

BEGIN TRANSACTION;
SELECT txid_current(); -- prints the current transaction id
INSERT INTO person (first_name, last_name) VALUES ('Hercule', 'Poirot');
COMMIT TRANSACTION;

On my computer, the SELECT txid_current(); statement prints out 156078. When we query the columns xmin and xmax, we can see the following values:

test=# SELECT xmin, xmax, first_name, last_name FROM person;
  xmin  | xmax | first_name | last_name
--------+------+------------+-----------
 156078 |    0 | Hercule    | Poirot
(1 row)

As you can see, the xmin column of the relevant row is set to the ID of the transaction in which it was commited. xmin can be interpreted as the lowest transaction ID that can see this column. Any transactions that have been started beforehand, and thus have lower ID, cannot see this row. The meaning of the xmax column is the exact opposite. This column is set to the ID of the transaction that deletes this row; any transactions that come after it cannot see the row. Essentially, for a transaction to see a row, the relationship xmin < current_txid < xmax should hold. There are two more columns (cmin and cmax) that are used for tracking rows per cursor state, but the details are not relevant here. For details of the algorithm, have a look at these slides.

MVCC is not the only method for RDBMS concurrency control; other databases use other mechanisms, such as rollback segments in Oracle or MySQL. These are like blocks of work which, when a transaction fails, are undone on rollback or in the next read that refers to those blocks. The advantage of MVCC compared to other methods is that rolling back a transaction has minimal cost. There is no cleanup that has to be done when a rollback happens; the memory and processor load is the same as the commit case.

Enter VACUUM

The disadvantage of MVCC is the topic of this post: The necessity of vacuuming. The primary purpose of vacuuming is as a garbage collector. Since PostgreSQL does not remove any rows from physical storage when they are updated or deleted, after some time (depending on the frequency of update and delete activity in the database), the database will be occupying a lot of essentially unused disk space. Garbage collection is not the only purpose of vacuuming, though. Two related things are visibility map and transaction ID wraparound. Visibility maps are PostgreSQL's way of avoiding unnecessary trips to the heap, where the actual row data is stored. When a query finds rows in an index, PostgreSQL has to check whether these rows are visible, i.e. not already deleted and to be vacuumed, by fetching the data from the heap. This IO trip is avoided using visibility maps that record which pages on the heap have only visible data. If a page is on this map, PostgreSQL does not have to visit it to ensure visibility. As a side note, the visibility map is the reason it took Postgresql longer to implement index-only scans, which are possible only since version 9.2.

Transaction ID wraparound is the name given to the fact that since these IDs are 32 bit integers, they cannot be greater than 232. When a database has processed more transactions than that, the transaction ID overflows, starting at 0 again. If no further action is undertaken, nearly all rows will suddenly become invisible, because they have positive transaction IDs. The solution implemented by PostgreSQL is setting the xmin id of rows with sensibly low xmin values to a special value FrozenTransactionID which is always considered to be lower (ergo older) than any transaction ID. This happens as a part of vacuuming, so if you do not vacuum your database for a long time, there is a real possibility that old data suddenly becomes invisible.

Edit: As Peter pointed out in the comments, the transaction ID comparison is presented in a simplified manner here. The real comparison of IDs involves modulo-arithmetic, so that the space of IDs wraps around. That is to say, for any ID x, there are 232 IDs smaller than x, and just as many greater. See the documentation for details.

Manual vacuuming is as simple as running VACUUM; in psql, or rather VACUUM VERBOSE; if you want to actually see what is happening. These commands also accept the name of a table as an optional argument. If this option is ommitted, VACUUM is executed on the whole database. Running only VACUUM is what one could call the first level of vacuuming; it takes care of deleted rows and updates the visibility map. What it does not do is to return the storage space to the operating system, however, contrary to what I said above. It actually updates what's called the free space map (FSM) to mark the pages that have free space due to deleted or updated rows. The next time a new row has to be written, PostgreSQL can consult this map and use the free space in the pages, instead of demanding more storage space from the OS. If you want to reclaim all free space, you need to run VACUUM FULL;, which might be necessary if you e.g. manually delete a lot of rows. Full vacuuming reprocesses table data, and rewrites a brand new version that is compacted and consumes exactly the space it needs. However, think twice before you run it: It locks the tables it is processing, and will block the both read and write queries.

Vacuum ≠ Analyze

As I mentioned in my previous post, PostgreSQL relies on statistics of column value distributions to generate efficient query plans. Updating these statistics is not the job of VACUUM, and requires a separate command, namely ANALYZE. You can run ANALYZE; either standalon in psql (or ANALYZE VERBOSE; for more input), or both maintenance commands together with VACUUM ANALYZE;. As with VACUUM, you can pass [VACUUM] ANALYZE the name of a single table. Fun note: Both ANALYZE and ANALYSE work, so go ahead and spell it the British way if you are keen to do so.

The autovacuum daemon

In order to make the jobs of database users worldwide easier, PostgreSQL since 8.1 comes with a daemon that runs both VACUUM and ANALYZE at certain intervals: The famous autovacuum daemon. It runs as a separate daemon process, the presence of which you can check with a simple ps aux | grep autovacuum. If you don't have any running vacuum processes, you should only see a "launcher process", otherwise you might also see workers. The autovacuum daemon checks each database in regular intervals to see whether it needs vacuuming and/or analyzing. If the number of rows that were updated or deleted is above a certain threshold for a table, these processes are executed. The number of deleted and updated rows is read from the statistics views; we can see an approximation for the person table with the following query:

test=# SELECT n_tup_del, n_tup_upd FROM pg_stat_all_tables WHERE relname = 'person';
 n_tup_del | n_tup_upd
-----------+-----------
         0 |         0
(1 row)

The threshold is calculated according to the following formula:

autovacuum_vacuum_threshold + (autovacuum_vacuum_scale_factor * pg_class.reltuples)

The constants starting with autovacuum in the above formula can be queried in psql from the pg_settings table. The last value can be obtained with SELECT reltuples from pg_class WHERE relname='person';. Bringing these together, we can write the following query as an approximation for what the autovacuum daemon does to decide whether to vacuum a table:

SELECT
(pt.n_tup_del + pt.n_tup_upd) > pgs_threshold.setting::int + (pgs_scale.setting::float * pc.reltuples)
AS should_vacuum
FROM pg_class pc JOIN pg_stat_all_tables pt ON pc.relname = pt.relname
                 CROSS JOIN pg_settings pgs_threshold
                 CROSS JOIN pg_settings pgs_scale
WHERE pt.relname='person'
AND pgs_threshold.name = 'autovacuum_vacuum_threshold'
AND pgs_scale.name = 'autovacuum_vacuum_scale_factor';

You have to keep in mind that the statistics we receive from the pg_stat_all_tables are accumulated since pg_stat_archiver.stats_reset. In the documentation, there is no remark as to which exact statistics the autovacuum daemon uses, but I'm pretty certain that only the tuples updated and deleted since the last vacuum run are included. Otherwise, the autovacuum daemon would have to vacuum every table in every run in the limit. The autovacuum daemon does a similar calculation to decide whether to run analyze; details can be found on the PostgreSQL documentation.

Improving Vacuuming

A frequent issue with the autovacuum daemon is that it gets to work at unexpected times of the day, maybe in the middle of a high load period, and causes deteriorated performance. Another symptom of improper vacuuming regime is queries that are executed with suboptimal query plans. The primary reason for this is incorrect table statistics, which can be alleviated by ANALYZE statements that run as a part of vacuuming. As you can see above, it's difficult to imitate the behavior of autovacuum. The ideal case would be to find out whether analytics are out of sync, but that's difficult to find out, and not even autovacuum does that:

The daemon schedules ANALYZE strictly as a function of the number of rows inserted or updated; it has no knowledge of whether that will lead to meaningful statistical changes.

You are also strongly advised to never turn off autovacuuming, because of the risks it involves. Even if you do frequent manual vacuuming, there might be unexpected bouts of high activity that affect many rows. Also, autovacuum will do little work if it runs when you have manual vacuuming, so it makes sense to just leave it running. The most sensible thing to do is to adjust the settings so that large and active tables are vacuumed more frequently. Here is a query to find out which tables have the most number of rows:

SELECT reltuples,relname FROM pg_class WHERE relkind='r' ORDER BY reltuples DESC;

You can either schedule a cron job to vacuum the largest tables regularly if you have periods of low load, or as is in our case, if the load on your application is continuous, you can adjust the parameters for these tables to run vacuuming more frequently. The vacuum parameters can be set separately for individual tables with the following query:

ALTER TABLE person SET (autovacuum_vacuum_scale_factor = 0.0);
ALTER TABLE person SET (autovacuum_vacuum_threshold = 4000);

As per the equation above, these settings would cause autovacuum to vacuum these tables every 4000 row updates or deletes, no matter how many rows are already in the table. This would lead to more frequent vacuuming of these tables, and shorter vacuum times in the runs where all tables are vacuumed, leading to better perforamnce. The settings for individual tables can be queried from the pg_class table as follows:

SELECT relname, reloptions FROM pg_class WHERE relname='person';

A simple test using pg_dump and pg_restore has revealed that settings changed with the ALTER statement above are also preserved in the dump and restore process, so you don't have to run it for every new instance of your database if you're reading in dumps.