how do you prevent dead rows from hanging around in postgresql?

i have production , staging rds instances on amazon, , staging's data direct copy of production both instances have duplicate data.

doing explain analyze select * my_table my_col=true;resulted in this:

seq scan on my_table (cost=0.00..142,775.73 rows=1 width=1,436) (actual time=18,170.294..18,170.294 rows=0 loops=1) filter: my_col rows removed filter: 360275

where in production, was:

seq scan on my_table (cost=0.00..62,145.88 rows=1 width=1,450) (actual time=282.487..282.487 rows=0 loops=1) filter: my_col rows removed filter: 366442

when running select pg_total_relation_size('my_table'::regclass);

i found staging's size double of production. i've read, see postgresql's mvcc responsible keeps multiple versions of rows around. manually ran vacuum full , afterwards saw staging's size had been cut down 2/3. running same explain analyze shows:

seq scan on my_table  (cost=0.00..56094.75 rows=1 width=1436) (actual time=1987.340..1987.340 rows=0 loops=1) filter: my_col rows removed filter: 360287 total runtime: 1987.547 ms

which great-- don't understand is, documentation suggests auto vacuum should kick in , cleaning these dead rows, yet not happening.

i've read several places talk "don't let indexes bloat", , don't quite understand 1) how index gets bloat, , 2) how prevent index getting bloat.

how can prevent happening again in future?

update

here autovacuum settings:

                name                 |  setting  | unit |  category  |                                        short_desc                                         | extra_desc |  context   | vartype | source  |  min_val  |  max_val   | enumvals | boot_val  | reset_val | sourcefile | sourceline -------------------------------------+-----------+------+------------+-------------------------------------------------------------------------------------------+------------+------------+---------+---------+-----------+------------+----------+-----------+-----------+------------+------------  autovacuum                          | on        |      | autovacuum | starts autovacuum subprocess.                                                         |            | sighup     | bool    | default |           |            |          | on        | on        |            |  autovacuum_analyze_scale_factor     | 0.1       |      | autovacuum | number of tuple inserts, updates, or deletes prior analyze fraction of reltuples. |            | sighup     | real    | default | 0         | 100        |          | 0.1       | 0.1       |            |  autovacuum_analyze_threshold        | 50        |      | autovacuum | minimum number of tuple inserts, updates, or deletes prior analyze.                    |            | sighup     | integer | default | 0         | 2147483647 |          | 50        | 50        |            |  autovacuum_freeze_max_age           | 200000000 |      | autovacuum | age @ autovacuum table prevent transaction id wraparound.                  |            | postmaster | integer | default | 100000000 | 2000000000 |          | 200000000 | 200000000 |            |  autovacuum_max_workers              | 3         |      | autovacuum | sets maximum number of simultaneously running autovacuum worker processes.            |            | postmaster | integer | default | 1         | 8388607    |          | 3         | 3         |            |  autovacuum_multixact_freeze_max_age | 400000000 |      | autovacuum | multixact age @ autovacuum table prevent multixact wraparound.             |            | postmaster | integer | default | 10000000  | 2000000000 |          | 400000000 | 400000000 |            |  autovacuum_naptime                  | 60        | s    | autovacuum | time sleep between autovacuum runs.                                                    |            | sighup     | integer | default | 1         | 2147483    |          | 60        | 60        |            |  autovacuum_vacuum_cost_delay        | 20        | ms   | autovacuum | vacuum cost delay in milliseconds, autovacuum.                                        |            | sighup     | integer | default | -1        | 100        |          | 20        | 20        |            |  autovacuum_vacuum_cost_limit        | -1        |      | autovacuum | vacuum cost amount available before napping, autovacuum.                              |            | sighup     | integer | default | -1        | 10000      |          | -1        | -1        |            |  autovacuum_vacuum_scale_factor      | 0.2       |      | autovacuum | number of tuple updates or deletes prior vacuum fraction of reltuples.            |            | sighup     | real    | default | 0         | 100        |          | 0.2       | 0.2       |            |  autovacuum_vacuum_threshold         | 50        |      | autovacuum | minimum number of tuple updates or deletes prior vacuum.                               |            | sighup     | integer | default | 0         | 2147483647 |          | 50        | 50        |            |

auto-vacuuming should around cleaning (assuming haven't disabled it), may not getting around enough purposes. there many settings can control auto-vacuuming , how/when it's done, may of interest: here , here.

this can true of tables high churn. is, tables lots of insertions , deletions. long-running , idle transactions can factor here, mvcc kick in , prevent dead tuples being reclaimed. fact manually doing vacuum frees dead tuples suggests isn't case you, though, , may former issue instead.

in general, it's not recommended vacuum full, takes out exclusive table lock, particularly when rows in table have been updated/deleted.

from doc:

the full option not recommended routine use, might useful in special cases. example when have deleted or updated of rows in table , table physically shrink occupy less disk space , allow faster table scans. vacuum full shrink table more plain vacuum would.

is usage pattern such case? did mention "direct copy" involved, it's not clear how that's being done.

i have had cases high-churn tables default auto-vacuum rate wasn't enough, , relatively small amounts of dead tuples affect query speed (this in large table queried , query needed extremely fast, , such, highly affected dead tuples).

to this, setup manual vacuum analyze of table (so both free tuples , aid query planner updating stats) in cron job set run every 5 minutes. since there weren't many dead tuples, vacuum pretty fast, , constant vacuuming keeps dead tuple count low enough keep queries of table fast.

edit in response comment op:

in vacuum doc, says that:

vacuum reclaims storage occupied dead tuples

the doc says (emphasis mine):

vacuum analyze performs vacuum , analyze each selected table. handy combination form routine maintenance scripts. see analyze more details processing.

so reclaims dead tuples.

Search This Blog

Brant

how do you prevent dead rows from hanging around in postgresql? - Database Administrators Stack Exchange

Comments

Post a Comment

Popular posts from this blog

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -