Unix Technical Forum

Autovacuum integration patch

This is a discussion on Autovacuum integration patch within the Pgsql Patches forums, part of the PostgreSQL category; --> Hackers, Here is a first cut at autovacuum integration. Please have a look at it. Note that this patch ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Patches

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-17-2008, 11:34 PM
Alvaro Herrera
 
Posts: n/a
Default Autovacuum integration patch

Hackers,

Here is a first cut at autovacuum integration. Please have a look at
it. Note that this patch automatically creates three new files:

src/backend/postmaster/autovacuum.c
src/include/catalog/pg_autovacuum.h
src/include/postmaster/autovacuum.h

Note that the daemon is not activated by default.

There are several things that are painfully evident with this thing on:

- TRUNCATE does not update stats. It should send a stat message to
which we can react.

- If you empty a whole table using DELETE just after an
automatically-issued VACUUM takes place, the new threshold may not be
enough to trigger a new VACUUM. Thus you end up with a bloated table,
and it won't get vacuumed until it grows again. This may be a problem
with the cost equations, but those are AFAICT identical to those of
pg_autovacuum, so we may need to rethink the equations.

- The default value of on for reset stats on server start is going to be
painful with autovacuum, because it reacts badly to losing the info.

- We should make VACUUM and ANALYZE update the pg_autovacuum relation,
in order to make the autovacuum daemon behave sanely with manually
issued VACUUM/ANALYZE.

- Having an autovacuum process running on a database can be surprising
if you want to drop a database, or create a new one using it as a
template. This happenned to me several times.

- The shutdown sequence is not debugged nor very well tested. It may be
all wrong.

- The startup sequence is a mixture from pgarch, normal backend and
pgstat. I find it relatively clean but I can't swear it's bug-free.

- There are no docs

- There are no ALTER TABLE commands to change the pg_autovacuum
attributes for a table. (Enable/disable, set thresholds and scaling
factor)

- I compiled with -DEXEC_BACKEND, but I didn't look to see if it
actually worked on that case.

Apart from all these issues, it is completely functional :-) It can
survive several "make installcheck" runs without problem, and the
regression database is vacuumed/analyzed as it runs.

Some of these issues are trivial to handle. However I'd like to release
this right now, so I can go back to "shared dependencies" now that role
support is in.

Barring any objections I think this should be integrated, so these
issues can be tackled by interested parties.

--
Alvaro Herrera (<alvherre[a]surnet.cl>)
"World domination is proceeding according to plan" (Andrew Morton)


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-17-2008, 11:34 PM
Matthew T. O'Connor
 
Posts: n/a
Default Re: Autovacuum integration patch

Alvaro Herrera wrote:

>Hackers,
>
>Here is a first cut at autovacuum integration. Please have a look at
>it. Note that this patch automatically creates three new files:
>
>


Couple more things that I didn't think about while we were talking about
this the other day.

XID wraparound: The patch as submitted doesn't handle XID wraparound
issues. The old contrib autovacuum would do an XID wraparound check as
it's 1st operation upon connecting to a database. If XID wraparound was
looks like it's going to be a problem soon, then the whole database
would be vacuumed, eliminating the need to check specific tables.

Better logging of autovacuum activity: I think the we could use some
more detail in the debug elog statements. For example showing exactly
what autovacuum believes the threshold and current count is.

How to deal with shared relations: As an optimization, the contrib
version of autovacuum treated shared relations different than it treated
the rest when connected to any database that is not template1. That is,
when connected to a DB other than template1, autovacuum would not issue
vacuum commands. rather it would only issue analyze commands. When
autovacuum got around to connecting to template1, it would then issue
the vacuum command. The hope was that this would reducing a shared
relation from getting vacuumed n times (where n is the number of
databases in a cluster) whenever it crossed over it's threshold. I'm
not sure if this optimizaion is really important, or even exactly correct.

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-17-2008, 11:37 PM
Matthew T. O'Connor
 
Posts: n/a
Default Re: Autovacuum integration patch

>
>
>>>> XID wraparound: The patch as submitted doesn't handle XID wraparound
>>>> issues. The old contrib autovacuum would do an XID wraparound check as
>>>> it's 1st operation upon connecting to a database. If XID wraparound was
>>>> looks like it's going to be a problem soon, then the whole database
>>>> would be vacuumed, eliminating the need to check specific tables.

>>
>>
>>
>> Hmm. Yes, this patch doesn't handle Xid wraparound. This should be
>> easy to add though. Anyway, I was thinking that we could add a "last
>> vacuum Xid" to pg_autovacuum, and handle Xid wraparound for each table
>> separately -- this means you don't have to issue huge whole-database
>> VACUUMs, because it will be handled nicely for each table. Storing the
>> last vacuum Xid in pg_database would have to be rethought.

>
>


The current implementation of XID wraparound requires that the vacuum
command be run against the entire database, you can not run it on a per
table basis and have it work. At least that is my understanding, it would
require some reworking of the vacuum system and I have no idea what is
involved in that. For now, we should just do it the simple way. BTW, I
think this is a candidate for only being done during the maintenance
window.


>> Maybe what we could do is have a separate pg_vacuum table to hold
>> constantly-moving information about the tables: last vacuum Xid, count
>> of tuples at last vacuum/analyze, etc; so pg_autovacuum would only hold
>> the constants for autovacuum equations. This pg_vacuum table would be
>> updated by VACUUM, not autovacuum, so it would be always correct and up-
>> to-date.

>
>


I'm not sure I see the value in a new pg_vacuum table. reltuples already has
the tuple count from the last vacuum and I don't think last XID on a per
table basis is helpful.


>>>> Better logging of autovacuum activity: I think the we could use some
>>>> more detail in the debug elog statements. For example showing exactly
>>>> what autovacuum believes the threshold and current count is.

>>
>>
>>
>> Ok. I actually had lots more logging in the original patch, but I
>> removed it because it was too verbose. Again, it's easy to add.

>
>


Well, I don't know what is best, but it would be nice to be able to get at
the information that tells you why autovacuum did or did not take action.
Perhaps put back what you had in, but move it up to a higher debug level. FWIW, I think the debug info from the contrib version was sufficient.


>>>> How to deal with shared relations: As an optimization, the contrib
>>>> version of autovacuum treated shared relations different than it treated
>>>> the rest when connected to any database that is not template1.

>>
>>
>>
>> Ah, interesting. Yes, I think that could be done too. Very easy to do.
>> Anyway, the shared relations are not that big usually, so this shouldn't
>> be an issue.

>
>


Agreed this is not a big issue, it's a bit of a micro optimization.


>>>> Couple of other thoughts:
>>>> Do the vacuum commands respect the GUC vacuum delay settings?

>>
>>
>>
>> Huh, I don't know. I just issue a vacuum() call. That function sets
>> the delay settings AFAICS, so I think it should be working.

>
>


Can someone confirm this?


>>>> Should we be able to set per table vacuum delay settings?

>>
>>
>> We could set that in the hypotetical pg_vacuum relation.

>
>


Again, I don't think this would be good for the pg_vacuum table, I think
it should be in the autovacuum table, because what a user wants
autovacuum to do might be different than what he wants a manually run
vacuum to do.

>>>> This patch doesn't have the "maintenance window" that was discussed a
>>>> while ago.

>>
>>
>> True. I have several questions about it. Where would that information
>> be stored, in another system catalog? Would it be per-database or
>> per-table? What happens if I'm not able to do all work inside the
>> maintenance window, is it left for the next one? If the maintenance
>> window ends and there is a vacuum running, is it terminated or is it
>> allowed to continue?

>
>


One could argue that it should be per database, but I think per cluster should be sufficient. I think it could be handled as few GUC settings, such as:
autovac_maint_begin = "1AM"
autovac_maint_duration = 4 (measured in hours)
autovac_maint_factor = .5 (reduce the thresholds by half during the maintenance window, this option might be good to have on a per table basis, if so, then add it to the pg_autovacuum table)

If there is still work to do after the maint window expires, then it's left for next time or when the regular threshold is exceeded which ever happens first. I wouldn't terminate an in progress vacuum.


>> There is a very important issue I forgot to mention. This autovacuum
>> process only handles databases that exist in the Stats hash table.
>> However, the stat hash table only has information about databases and
>> tables that have been used in the current postmaster run. So if you
>> don't connect to a database regularly, that database won't get
>> "autovacuumed" after a postmaster restart. I think (but IMBFOS) that
>> this is also true for individual tables, i.e. a table that doesn't get
>> used won't be in the stat hash table and thus won't be processed. This
>> is a very important "gotcha."

>
>


Is it? If no one has connected to that database, it can't need to be
vacuumed. Especially in a production situation, I could imagine this
being an issue in development but...


>> This is, of course, not important in the normal case, because a table
>> that isn't used does not need vacuum. But in the Xid-wraparound case
>> it may be fatal.

>
>


hmmm... didn't think about that... Perhaps a better way to get our
database list is to manually create one each time we connect to the
postgres database, or the template1 database if the postgres database
doesn't exist. Thoughts?


>> I think the stat collector should be told about every existant database
>> and every table therein, so that autovacuum can do its work as the user
>> would expect.

>
>


Seems wrong to me to rework the stats system to accommodate autovacuum. Perhaps there is an easier way such as what I mentioned above or something else.


Matt



---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-17-2008, 11:37 PM
Tom Lane
 
Posts: n/a
Default Re: Autovacuum integration patch

"Matthew T. O'Connor" <matthew@zeut.net> writes:
>>> Hmm. Yes, this patch doesn't handle Xid wraparound. This should be
>>> easy to add though. Anyway, I was thinking that we could add a "last
>>> vacuum Xid" to pg_autovacuum, and handle Xid wraparound for each table
>>> separately -- this means you don't have to issue huge whole-database
>>> VACUUMs, because it will be handled nicely for each table. Storing the
>>> last vacuum Xid in pg_database would have to be rethought.


> The current implementation of XID wraparound requires that the vacuum
> command be run against the entire database, you can not run it on a per
> table basis and have it work. At least that is my understanding,


No, you're wrong. VACUUMing of individual tables is perfectly good
enough as far as XID wrap protection goes, it's just that we chose to
track whether it had been done at the database level. If we tracked it
in, say, a new pg_class column then in principle you could protect
against XID wrap with only table-at-a-time VACUUMs. (I think you'd
still want the pg_database column, but you'd update it to be the minimum
of the per-table values at the completion of any VACUUM.)

At the time this didn't seem particularly worth the complication since
no one would be likely to try to do that manually --- but with
autovacuum handling the work, it starts to sound more realistic.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-17-2008, 11:37 PM
Matthew T. O'Connor
 
Posts: n/a
Default Re: Autovacuum integration patch

Tom Lane wrote:

>"Matthew T. O'Connor" <matthew@zeut.net> writes:
>
>
>>The current implementation of XID wraparound requires that the vacuum
>>command be run against the entire database, you can not run it on a per
>>table basis and have it work. At least that is my understanding,
>>
>>

>
>No, you're wrong. VACUUMing of individual tables is perfectly good
>enough as far as XID wrap protection goes, it's just that we chose to
>track whether it had been done at the database level. If we tracked it
>in, say, a new pg_class column then in principle you could protect
>against XID wrap with only table-at-a-time VACUUMs. (I think you'd
>still want the pg_database column, but you'd update it to be the minimum
>of the per-table values at the completion of any VACUUM.)
>
>At the time this didn't seem particularly worth the complication since
>no one would be likely to try to do that manually --- but with
>autovacuum handling the work, it starts to sound more realistic.
>


Good, I'm glad I'm wrong on this. This will be another nice advantage
of autovacuum then and should be fairly easy to do. Any thoughts on
this being a change we can get in for 8.1?

Matt


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-17-2008, 11:37 PM
Tom Lane
 
Posts: n/a
Default Re: Autovacuum integration patch

"Matthew T. O'Connor" <matthew@zeut.net> writes:
> Tom Lane wrote:
>> No, you're wrong. VACUUMing of individual tables is perfectly good
>> enough as far as XID wrap protection goes, it's just that we chose to
>> track whether it had been done at the database level. If we tracked it
>> in, say, a new pg_class column then in principle you could protect
>> against XID wrap with only table-at-a-time VACUUMs.


> Good, I'm glad I'm wrong on this. This will be another nice advantage
> of autovacuum then and should be fairly easy to do. Any thoughts on
> this being a change we can get in for 8.1?


I'd say this is probably a tad too late --- there's a fair amount of
code change that would be needed, none of which has been written, and
we are past the feature-freeze deadline for new code.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-17-2008, 11:37 PM
Alvaro Herrera
 
Posts: n/a
Default Re: Autovacuum integration patch

On Tue, Jul 05, 2005 at 01:00:50PM -0400, Tom Lane wrote:
> "Matthew T. O'Connor" <matthew@zeut.net> writes:
> > Tom Lane wrote:
> >> No, you're wrong. VACUUMing of individual tables is perfectly good
> >> enough as far as XID wrap protection goes, it's just that we chose to
> >> track whether it had been done at the database level. If we tracked it
> >> in, say, a new pg_class column then in principle you could protect
> >> against XID wrap with only table-at-a-time VACUUMs.

>
> > Good, I'm glad I'm wrong on this. This will be another nice advantage
> > of autovacuum then and should be fairly easy to do. Any thoughts on
> > this being a change we can get in for 8.1?

>
> I'd say this is probably a tad too late --- there's a fair amount of
> code change that would be needed, none of which has been written, and
> we are past the feature-freeze deadline for new code.


Right. I've written a small, non-intrusive patch that handles the Xid
wraparound just as pg_autovacuum used to, checking the Xid from
pg_database.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-17-2008, 11:37 PM
Bruce Momjian
 
Posts: n/a
Default Re: Autovacuum integration patch


TODO item?

---------------------------------------------------------------------------

Alvaro Herrera wrote:
> On Tue, Jul 05, 2005 at 01:00:50PM -0400, Tom Lane wrote:
> > "Matthew T. O'Connor" <matthew@zeut.net> writes:
> > > Tom Lane wrote:
> > >> No, you're wrong. VACUUMing of individual tables is perfectly good
> > >> enough as far as XID wrap protection goes, it's just that we chose to
> > >> track whether it had been done at the database level. If we tracked it
> > >> in, say, a new pg_class column then in principle you could protect
> > >> against XID wrap with only table-at-a-time VACUUMs.

> >
> > > Good, I'm glad I'm wrong on this. This will be another nice advantage
> > > of autovacuum then and should be fairly easy to do. Any thoughts on
> > > this being a change we can get in for 8.1?

> >
> > I'd say this is probably a tad too late --- there's a fair amount of
> > code change that would be needed, none of which has been written, and
> > we are past the feature-freeze deadline for new code.

>
> Right. I've written a small, non-intrusive patch that handles the Xid
> wraparound just as pg_autovacuum used to, checking the Xid from
> pg_database.
>
> --
> Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
> "Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>


--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-17-2008, 11:37 PM
Matthew T. O'Connor
 
Posts: n/a
Default Re: Autovacuum integration patch

I think so. Something like: Improve autovacuum xid wraparound detection
by moving to a pertable solution rather than per database.

Matt


Bruce Momjian wrote:

>TODO item?
>
>
>



---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-17-2008, 11:37 PM
Bruce Momjian
 
Posts: n/a
Default Re: Autovacuum integration patch

Matthew T. O'Connor wrote:
> I think so. Something like: Improve autovacuum xid wraparound detection
> by moving to a pertable solution rather than per database.


Thanks, added.

--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 01:40 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com