how do you prevent dead rows from hanging around in postgresql? - Database Administrators Stack Exchange
i have production , staging rds instances on amazon, , staging's data direct copy of production both instances have duplicate data.
doing explain analyze select * my_table my_col=true;
resulted in this:
seq scan on my_table (cost=0.00..142,775.73 rows=1 width=1,436) (actual time=18,170.294..18,170.294 rows=0 loops=1) filter: my_col rows removed filter: 360275
where in production, was:
seq scan on my_table (cost=0.00..62,145.88 rows=1 width=1,450) (actual time=282.487..282.487 rows=0 loops=1) filter: my_col rows removed filter: 366442
when running select pg_total_relation_size('my_table'::regclass);
i found staging's size double of production. i've read, see postgresql's mvcc responsible keeps multiple versions of rows around. manually ran vacuum full
, afterwards saw staging's size had been cut down 2/3. running same explain analyze shows:
seq scan on my_table (cost=0.00..56094.75 rows=1 width=1436) (actual time=1987.340..1987.340 rows=0 loops=1) filter: my_col rows removed filter: 360287 total runtime: 1987.547 ms
which great-- don't understand is, documentation suggests auto vacuum should kick in , cleaning these dead rows, yet not happening.
i've read several places talk "don't let indexes bloat", , don't quite understand 1) how index gets bloat, , 2) how prevent index getting bloat.
how can prevent happening again in future?
update
here autovacuum settings:
name | setting | unit | category | short_desc | extra_desc | context | vartype | source | min_val | max_val | enumvals | boot_val | reset_val | sourcefile | sourceline -------------------------------------+-----------+------+------------+-------------------------------------------------------------------------------------------+------------+------------+---------+---------+-----------+------------+----------+-----------+-----------+------------+------------ autovacuum | on | | autovacuum | starts autovacuum subprocess. | | sighup | bool | default | | | | on | on | | autovacuum_analyze_scale_factor | 0.1 | | autovacuum | number of tuple inserts, updates, or deletes prior analyze fraction of reltuples. | | sighup | real | default | 0 | 100 | | 0.1 | 0.1 | | autovacuum_analyze_threshold | 50 | | autovacuum | minimum number of tuple inserts, updates, or deletes prior analyze. | | sighup | integer | default | 0 | 2147483647 | | 50 | 50 | | autovacuum_freeze_max_age | 200000000 | | autovacuum | age @ autovacuum table prevent transaction id wraparound. | | postmaster | integer | default | 100000000 | 2000000000 | | 200000000 | 200000000 | | autovacuum_max_workers | 3 | | autovacuum | sets maximum number of simultaneously running autovacuum worker processes. | | postmaster | integer | default | 1 | 8388607 | | 3 | 3 | | autovacuum_multixact_freeze_max_age | 400000000 | | autovacuum | multixact age @ autovacuum table prevent multixact wraparound. | | postmaster | integer | default | 10000000 | 2000000000 | | 400000000 | 400000000 | | autovacuum_naptime | 60 | s | autovacuum | time sleep between autovacuum runs. | | sighup | integer | default | 1 | 2147483 | | 60 | 60 | | autovacuum_vacuum_cost_delay | 20 | ms | autovacuum | vacuum cost delay in milliseconds, autovacuum. | | sighup | integer | default | -1 | 100 | | 20 | 20 | | autovacuum_vacuum_cost_limit | -1 | | autovacuum | vacuum cost amount available before napping, autovacuum. | | sighup | integer | default | -1 | 10000 | | -1 | -1 | | autovacuum_vacuum_scale_factor | 0.2 | | autovacuum | number of tuple updates or deletes prior vacuum fraction of reltuples. | | sighup | real | default | 0 | 100 | | 0.2 | 0.2 | | autovacuum_vacuum_threshold | 50 | | autovacuum | minimum number of tuple updates or deletes prior vacuum. | | sighup | integer | default | 0 | 2147483647 | | 50 | 50 | |
auto-vacuuming should around cleaning (assuming haven't disabled it), may not getting around enough purposes. there many settings can control auto-vacuuming , how/when it's done, may of interest: here , here.
this can true of tables high churn. is, tables lots of insertions , deletions. long-running , idle transactions can factor here, mvcc kick in , prevent dead tuples being reclaimed. fact manually doing vacuum
frees dead tuples suggests isn't case you, though, , may former issue instead.
in general, it's not recommended vacuum full
, takes out exclusive table lock, particularly when rows in table have been updated/deleted.
from doc:
the full option not recommended routine use, might useful in special cases. example when have deleted or updated of rows in table , table physically shrink occupy less disk space , allow faster table scans. vacuum full shrink table more plain vacuum would.
is usage pattern such case? did mention "direct copy" involved, it's not clear how that's being done.
i have had cases high-churn tables default auto-vacuum rate wasn't enough, , relatively small amounts of dead tuples affect query speed (this in large table queried , query needed extremely fast, , such, highly affected dead tuples).
to this, setup manual vacuum analyze
of table (so both free tuples , aid query planner updating stats) in cron job set run every 5 minutes. since there weren't many dead tuples, vacuum
pretty fast, , constant vacuuming keeps dead tuple count low enough keep queries of table fast.
edit in response comment op:
in vacuum doc, says that:
vacuum reclaims storage occupied dead tuples
the doc says (emphasis mine):
vacuum analyze performs vacuum , analyze each selected table. handy combination form routine maintenance scripts. see analyze more details processing.
so reclaims dead tuples.
Comments
Post a Comment