autovacuumtuning. I’ll very briefly explain the necessary theory (dead tuples, bloat and how
autovacuumdeals with it), but the main focus of this blog post is tuning – what configuration options are there, rules of thumb, etc.
DELETEin PostgreSQL, the row (aka tuple) is not immediately removed from the data file. Instead it is only marked as deleted by setting
xmaxfield in a header. Similarly for
UPDATEs, which may be seen as
UPDATEs the dead tuples might easily account for vast majority of disk space. Of course, those dead tuples would also be referenced from indexes, further increasing the amount of wasted disk space. This is what we call “bloat” in PostgreSQL. And naturally, the more data queries have to process (even if 99% of it is immediately thrown away as “dead”), the slower the queries.
VACUUMcommand. This maintenance command will scan the table and remove dead tuples both from the table and indexes – it will not generally return the disk space back to the operating system, but it will make it usable for new rows.
VACUUM FULLwould reclaim the space and return it to the OS, but is has a number of disadvantages. Firstly it acquires exclusive lock on the table, blocking all operations (including
SELECTs). Secondly, it essentially creates a copy of the table, doubling the disk space needed, so it’s not very practical when already running out of disk space.
VACUUMis that it’s entirely manual action – it only happens when you decide run it, not when it’s needed. You may put it into
cronand run it every 5 minutes on all tables, but the chances are most of the runs will not actually clean anything, and the only effect will be higher CPU and I/O usage on the system. Or you may run it only once a day at night, in which case you’ll probably accumulate more dead tuples that you’d like.
autovacuum; do the cleanup as needed to keep the amount of wasted space under control. The database does know how many dead tuples were produced over time (each transaction reports the number of tuples it deleted and updated), and so can trigger cleanup when the table accumulates a certain number of dead tuples (by default this is 20% of the table, as we’ll see). So it will be executed more often during busy periods, and less often when the database is mostly idle.
autovacuum. It’s also responsible for updating data distribution statistics, used by the optimizer when planning queries. You may collect those manually by running
ANALYZE, but it suffers similar issues as
VACUUM– you’re likely to run it either too often or not often enough.
ANALYZE, because while the cost of
VACUUMis proportional to the amount of dead tuples (so fairly low when there are few/none),
ANALYZEhas to rebuild the statistics from scratch on every execution. On the other hand, if you’re not running it often enough, the cost of poor plan choices may be just as severe.
autovacuumtask in the rest of this post for the sake of brevity – the configuration is fairly similar to the cleanup anyway, and follows roughly the same reasoning.
postgresql.confare quite conservative, for two reasons. Firstly, the default values were decided a few years ago, based on the resources (CPU, RAM, …) common at that time. Secondly, we want the default configuration to work everywhere, including tiny machines like Raspberry Pi or small VPS servers. For many deployments (particularly smaller ones and/or handling read-mostly workloads) the default configuration parameters will however work just fine.
autovacuumentirely. Please don’t do that unless you really (really really) know what you’re doing, and have regular cleanup script in place. Otherwise you’re painting yourself in the corner, and instead of somewhat degraded performance you’ll have to deal with severely degraded performance or possibly even an outage.
threshold + pg_class.reltuples * scale_factor
autovacuum_vacuum_scale_factor = 0.01
postgresql.confaffect all tables (the whole cluster, in fact), and it may undesirably affect cleanups of small tables, including for example system catalogs.
autovacuumis throttling. The cleanup is meant to be a maintenance task running in the background, with minimum impact on user queries etc. In other words, it should not consume too much resources (CPU and disk I/O), and this is exactly the purpose of the throttling built into
shared_buffers, it counts as 1. If it’s not found in
shared_buffersand needs to be read from the OS, it counts as 10 (it might still be served from RAM, but we don’t know). And finally, if the page is dirtied by the cleanup, it counts as 20. That allows us to compute “cost of work” done by
shared_buffers(assuming it’s not dirtied)
cost_limitparameter, e.g. to 1000 (or 2000), which increases the throughput by 5x (or 10x). You may of course tweak the other parameters (cost per page operation, sleep delay), but we do that only very rarely – changing the cost limit works well enough.
autovacuum_max_workers, so what’s that about? Well, the cleanup does not happen in a single
autovacuumprocess, but the database is allowed to start up to
autovacuum_max_workersprocesses that actually do cleanup of different databases/tables.
autovacuumworkers up to 6, it’ll surely do twice as much work compared to the default 3 workers, right?
autovacuumworkers. Each worker process only gets
1/autovacuum_max_workersof the total cost limit, so increasing the number of workers will only make them go slower.
autovacuumworkers, I’ve been lying (a bit). Similarly to scale factor and threshold, it’s possible to set the cost limit and delay per table:
autovacuum. If I had to sum it into a few basic rules, it’d be these five:
autovacuum, unless your really know what you’re doing. Seriously.
DELETEs), particularly large ones, you should probably decrease the scale factor, so that cleanup happens more frequently.
autovacuum_max_workersalone will not really help in most cases. You’ll get more processes that go slower.
ALTER TABLE, but think twice if you really need that. It makes the system more complex and more difficult to inspect.
autovacuumdoes not really work, and how to detect them (and what is the best solution), but the blog post is already too long so I’ll post that separately in a few days.