Unix Technical Forum

Vacuum, analyze, and setting reltuples of pg_class

This is a discussion on Vacuum, analyze, and setting reltuples of pg_class within the pgsql Hackers forums, part of the PostgreSQL category; --> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Short version: is it optimal for vacuum to always populate reltuples with live ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 05:56 AM
Greg Sabino Mullane
 
Posts: n/a
Default Vacuum, analyze, and setting reltuples of pg_class


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Short version: is it optimal for vacuum to always populate reltuples
with live rows + dead rows?

I came across a problem in which I noticed that a vacuum did not change
the reltuples value as I expected. A vacuum analyze indicated a correct
estimated number of rows, but the number put into reltuples was not
similar. Running analyze alone did put a more accurate number. After
some IRC talk and digging through the code, it appears that because the
system is busy, the dead rows could not be removed at that time, and
vacuum (and vacuum analyze) (and vacuum full analyze) uses the number of
live rows + dead rows to populate reltuples. Are there any alternatives to
running analyze outside of vacuum every time to ensure a better count? Is
there serious drawbacks in vacuum using the live versus the live vs. dead?
Is there any way to encourage those dead rows to go away, or to figure out what
is preventing them from being reaped? This is cluster-wide, and happens
even on newly created tables, but here is a real-life example on a busy table:

greg=# select reltuples, relpages from pg_class where relname = 'q';
reltuples | relpages
- -----------+----------
970 | 5724

greg=# select count(*) from q;
count
- -------
979

greg=# vacuum q;
VACUUM

greg=# select reltuples, relpages from pg_class where relname = 'q';
reltuples | relpages
- -----------+----------
2100 | 5724

greg=# vacuum full analyze q;
VACUUM

greg=# select reltuples, relpages from pg_class where relname = 'q';
reltuples | relpages
- -----------+----------
2116 | 5724

greg=# analyze q;
ANALYZE

greg=# select reltuples, relpages from pg_class where relname = 'q';
reltuples | relpages
- -----------+----------
897 | 5724


We've got much bigger tables that are affected worse than the example
above, of course. I'm pretty sure this is what Jeff Boes was experiencing
in 7.2, from this old thread:

http://svr5.postgresql.org/pgsql-bug...0/msg00138.php

I presume that the non-duplication was because Tom's database was not
so busy as to have dead rows laying around at the end of the vacuum
runs.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200612111128
http://biglumber.com/x/web?pk=2529DF...9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFFfYq4vJuQZxSWSsgRAtoZAKDngqVnt77SLXmp/nvuOnUGfoEMOgCcD8lE
jjB7atW6824o6vd85wl6+ps=
=O7N/
-----END PGP SIGNATURE-----



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-12-2008, 05:56 AM
Tom Lane
 
Posts: n/a
Default Re: Vacuum, analyze, and setting reltuples of pg_class

"Greg Sabino Mullane" <greg@turnstep.com> writes:
> Short version: is it optimal for vacuum to always populate reltuples
> with live rows + dead rows?


If we didn't do that, it would tend to encourage the use of seqscans on
tables with lots of dead rows, which is probably a bad thing.

> Is there any way to encourage those dead rows to go away,


Close your open transactions.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-12-2008, 05:56 AM
Greg Sabino Mullane
 
Posts: n/a
Default Re: Vacuum, analyze, and setting reltuples of pg_class


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Tom Lane replied:
>> Short version: is it optimal for vacuum to always populate reltuples
>> with live rows + dead rows?


> If we didn't do that, it would tend to encourage the use of seqscans on
> tables with lots of dead rows, which is probably a bad thing.


Bleh. Isn't that what a plain analyze would encourage then? Should analyze
be considering the dead rows somehow as well?

>> Is there any way to encourage those dead rows to go away,


> Close your open transactions.


There are no long-running transactions running, but it is a very busy database,
so the chances of something else on the cluster being in a transaction at
any point in time is very high. Still, why would an open transaction elsewhere
block other databases / other tables for a vacuum full? E.g.:

prod=# create database gtest;
CREATE DATABASE

prod=# \c gtest
You are now connected to database "gtest".

gtest=# create table gtest(a int);
CREATE TABLE

gtest=# insert into gtest select 1 from generate_series(1,10);
INSERT 0 10

gtest=# delete from gtest;
DELETE 10

gtest=# vacuum full gtest;
VACUUM

gtest=# analyze verbose gtest;
INFO: analyzing "public.gtest"
INFO: "gtest": scanned 1 of 1 pages, containing 0 live rows and
10 dead rows; 0 rows in sample, 0 estimated total rows
ANALYZE

This is 8.1.3, by the way. At the very least, I'll submit a doc patch at
the end of all this.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200612111226
http://biglumber.com/x/web?pk=2529DF...9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFFfZYLvJuQZxSWSsgRAmeDAKCPK2h9trzLn+1V6yN7cU jsnd/3VwCfT3Il
hdCrUGCVso01xkDRDKLUlpI=
=VOrr
-----END PGP SIGNATURE-----



---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-12-2008, 05:56 AM
Tom Lane
 
Posts: n/a
Default Re: Vacuum, analyze, and setting reltuples of pg_class

"Greg Sabino Mullane" <greg@turnstep.com> writes:
> Bleh. Isn't that what a plain analyze would encourage then? Should analyze
> be considering the dead rows somehow as well?


Very possibly, at least for counting purposes (it mustn't try to analyze
the content of such rows, since they could be incompatible with the
table's current rowtype).

> Still, why would an open transaction elsewhere
> block other databases / other tables for a vacuum full?


The tracking of global xmin isn't specific enough to distinguish which
database a transaction is in. VACUUM does ignore the xmins of xacts in
other databases, but unfortunately the advertised xmin of another xact
in our *own* database will still include them.

There's been some discussion of advertising both a global and local xmin
in the PGPROC array, but this would impose extra complexity on every
single transaction start, and it's not clear that the benefit is worth that.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-12-2008, 05:58 AM
Jim C. Nasby
 
Posts: n/a
Default Re: Vacuum, analyze, and setting reltuples of pg_class

On Mon, Dec 11, 2006 at 12:08:30PM -0500, Tom Lane wrote:
> "Greg Sabino Mullane" <greg@turnstep.com> writes:
> > Short version: is it optimal for vacuum to always populate reltuples
> > with live rows + dead rows?

>
> If we didn't do that, it would tend to encourage the use of seqscans on
> tables with lots of dead rows, which is probably a bad thing.


So then why does vacuum do that? ISTM that it makes more sense for it to
act the same as analyze and only count live rows.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-12-2008, 05:58 AM
Tom Lane
 
Posts: n/a
Default Re: Vacuum, analyze, and setting reltuples of pg_class

"Jim C. Nasby" <jim@nasby.net> writes:
> On Mon, Dec 11, 2006 at 12:08:30PM -0500, Tom Lane wrote:
>> "Greg Sabino Mullane" <greg@turnstep.com> writes:
>>> Short version: is it optimal for vacuum to always populate reltuples
>>> with live rows + dead rows?

>>
>> If we didn't do that, it would tend to encourage the use of seqscans on
>> tables with lots of dead rows, which is probably a bad thing.


> So then why does vacuum do that? ISTM that it makes more sense for it to
> act the same as analyze and only count live rows.


I think what you misread what I said: it's better to have the larger
count in reltuples so that the planner won't try to use a seqscan when
there are, say, 3 live tuples and 100K dead ones. The real problem is
that analyze ought to act more like vacuum, but since it presently
ignores deaders altogether, it fails to.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-12-2008, 05:58 AM
Zeugswetter Andreas ADI SD
 
Posts: n/a
Default Re: Vacuum, analyze, and setting reltuples of pg_class


> >>> Short version: is it optimal for vacuum to always populate

reltuples
> >>> with live rows + dead rows?
> >>
> >> If we didn't do that, it would tend to encourage the use of

seqscans on
> >> tables with lots of dead rows, which is probably a bad thing.

>
> > So then why does vacuum do that? ISTM that it makes more sense for

it to
> > act the same as analyze and only count live rows.

>
> I think what you misread what I said: it's better to have the larger
> count in reltuples so that the planner won't try to use a seqscan when
> there are, say, 3 live tuples and 100K dead ones.


I don't agree. The metric to avoid scans should be/is table size.
(number of pages needed to be read for expected number of rows)
The number of tuples is relevant to estimate call frequency of
related nodes. So from that perspective we do not want dead tuples
in the count.
Maybe we need to improve the estimate in the large table few live
tuples case, but I think we should adjust vacuum and not analyze.

If you have a join with the said table with 3 rows and join it
to a same size but lots of visible tuples table, you would want to
start with the table with 3 rows.

Andreas

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 11:35 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com