how do you prevent dead rows from hanging around in postgresql? - Database Administrators Stack Exchange


i have production , staging rds instances on amazon, , staging's data direct copy of production both instances have duplicate data.

doing explain analyze select * my_table my_col=true;resulted in this:

seq scan on my_table (cost=0.00..142,775.73 rows=1 width=1,436) (actual time=18,170.294..18,170.294 rows=0 loops=1) filter: my_col rows removed filter: 360275 

where in production, was:

seq scan on my_table (cost=0.00..62,145.88 rows=1 width=1,450) (actual time=282.487..282.487 rows=0 loops=1) filter: my_col rows removed filter: 366442 

when running select pg_total_relation_size('my_table'::regclass);

i found staging's size double of production. i've read, see postgresql's mvcc responsible keeps multiple versions of rows around. manually ran vacuum full , afterwards saw staging's size had been cut down 2/3. running same explain analyze shows:

seq scan on my_table  (cost=0.00..56094.75 rows=1 width=1436) (actual time=1987.340..1987.340 rows=0 loops=1) filter: my_col rows removed filter: 360287 total runtime: 1987.547 ms 

which great-- don't understand is, documentation suggests auto vacuum should kick in , cleaning these dead rows, yet not happening.

i've read several places talk "don't let indexes bloat", , don't quite understand 1) how index gets bloat, , 2) how prevent index getting bloat.

how can prevent happening again in future?

update

here autovacuum settings:

                name                 |  setting  | unit |  category  |                                        short_desc                                         | extra_desc |  context   | vartype | source  |  min_val  |  max_val   | enumvals | boot_val  | reset_val | sourcefile | sourceline -------------------------------------+-----------+------+------------+-------------------------------------------------------------------------------------------+------------+------------+---------+---------+-----------+------------+----------+-----------+-----------+------------+------------  autovacuum                          | on        |      | autovacuum | starts autovacuum subprocess.                                                         |            | sighup     | bool    | default |           |            |          | on        | on        |            |  autovacuum_analyze_scale_factor     | 0.1       |      | autovacuum | number of tuple inserts, updates, or deletes prior analyze fraction of reltuples. |            | sighup     | real    | default | 0         | 100        |          | 0.1       | 0.1       |            |  autovacuum_analyze_threshold        | 50        |      | autovacuum | minimum number of tuple inserts, updates, or deletes prior analyze.                    |            | sighup     | integer | default | 0         | 2147483647 |          | 50        | 50        |            |  autovacuum_freeze_max_age           | 200000000 |      | autovacuum | age @ autovacuum table prevent transaction id wraparound.                  |            | postmaster | integer | default | 100000000 | 2000000000 |          | 200000000 | 200000000 |            |  autovacuum_max_workers              | 3         |      | autovacuum | sets maximum number of simultaneously running autovacuum worker processes.            |            | postmaster | integer | default | 1         | 8388607    |          | 3         | 3         |            |  autovacuum_multixact_freeze_max_age | 400000000 |      | autovacuum | multixact age @ autovacuum table prevent multixact wraparound.             |            | postmaster | integer | default | 10000000  | 2000000000 |          | 400000000 | 400000000 |            |  autovacuum_naptime                  | 60        | s    | autovacuum | time sleep between autovacuum runs.                                                    |            | sighup     | integer | default | 1         | 2147483    |          | 60        | 60        |            |  autovacuum_vacuum_cost_delay        | 20        | ms   | autovacuum | vacuum cost delay in milliseconds, autovacuum.                                        |            | sighup     | integer | default | -1        | 100        |          | 20        | 20        |            |  autovacuum_vacuum_cost_limit        | -1        |      | autovacuum | vacuum cost amount available before napping, autovacuum.                              |            | sighup     | integer | default | -1        | 10000      |          | -1        | -1        |            |  autovacuum_vacuum_scale_factor      | 0.2       |      | autovacuum | number of tuple updates or deletes prior vacuum fraction of reltuples.            |            | sighup     | real    | default | 0         | 100        |          | 0.2       | 0.2       |            |  autovacuum_vacuum_threshold         | 50        |      | autovacuum | minimum number of tuple updates or deletes prior vacuum.                               |            | sighup     | integer | default | 0         | 2147483647 |          | 50        | 50        |            | 

auto-vacuuming should around cleaning (assuming haven't disabled it), may not getting around enough purposes. there many settings can control auto-vacuuming , how/when it's done, may of interest: here , here.

this can true of tables high churn. is, tables lots of insertions , deletions. long-running , idle transactions can factor here, mvcc kick in , prevent dead tuples being reclaimed. fact manually doing vacuum frees dead tuples suggests isn't case you, though, , may former issue instead.

in general, it's not recommended vacuum full, takes out exclusive table lock, particularly when rows in table have been updated/deleted.

from doc:

the full option not recommended routine use, might useful in special cases. example when have deleted or updated of rows in table , table physically shrink occupy less disk space , allow faster table scans. vacuum full shrink table more plain vacuum would.

is usage pattern such case? did mention "direct copy" involved, it's not clear how that's being done.

i have had cases high-churn tables default auto-vacuum rate wasn't enough, , relatively small amounts of dead tuples affect query speed (this in large table queried , query needed extremely fast, , such, highly affected dead tuples).

to this, setup manual vacuum analyze of table (so both free tuples , aid query planner updating stats) in cron job set run every 5 minutes. since there weren't many dead tuples, vacuum pretty fast, , constant vacuuming keeps dead tuple count low enough keep queries of table fast.

edit in response comment op:

in vacuum doc, says that:

vacuum reclaims storage occupied dead tuples

the doc says (emphasis mine):

vacuum analyze performs vacuum , analyze each selected table. handy combination form routine maintenance scripts. see analyze more details processing.

so reclaims dead tuples.


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -