Unix Technical Forum

Re: CPU-intensive autovacuuming

This is a discussion on Re: CPU-intensive autovacuuming within the Pgsql General forums, part of the PostgreSQL category; --> Following up on my own post from last night: > Could it be that there is some code in ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql General

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-08-2008, 09:17 PM
Phil Endecott
 
Posts: n/a
Default Re: CPU-intensive autovacuuming

Following up on my own post from last night:

> Could it be that there is some code in autovacuum that is O(n^2) in
> the number of tables?


Browsing the code using webcvs, I have found this:

for (j = 0; j < PQntuples(res); j++)
{
tbl_elem = DLGetHead(dbs->table_list);
while (tbl_elem != NULL)
{

I haven't really tried to understand what is going on in here, but it
does look like it is getting the result of the "pg_class join stats"
query and then matching it up against its internal list of tables using
nested loops, which is undoubtedly O(n^2) in the number of tables.

Have I correctly understood what is going on here?

--Phil.


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-08-2008, 09:18 PM
Matthew T. O'Connor
 
Posts: n/a
Default Re: CPU-intensive autovacuuming

Phil Endecott wrote:

> Following up on my own post from last night:
>
> > Could it be that there is some code in autovacuum that is O(n^2) in
> > the number of tables?

>
> Browsing the code using webcvs, I have found this:
>
> for (j = 0; j < PQntuples(res); j++)
> {
> tbl_elem = DLGetHead(dbs->table_list);
> while (tbl_elem != NULL)
> {
>
> I haven't really tried to understand what is going on in here, but it
> does look like it is getting the result of the "pg_class join stats"
> query and then matching it up against its internal list of tables
> using nested loops, which is undoubtedly O(n^2) in the number of tables.
>
> Have I correctly understood what is going on here?



Indeed you have. I have head a few similar reports but perhaps none as
bad as yours. One person put a small sleep value so that it doesn't
spin so tight. You could also just up the sleep delay so that it
doesn't do this work quite so often. No other quick suggestions.


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-08-2008, 09:18 PM
Phil Endecott
 
Posts: n/a
Default Re: CPU-intensive autovacuuming

Matthew T. O'Connor wrote:
> Phil Endecott wrote:
>> > Could it be that there is some code in autovacuum that is O(n^2) in
>> > the number of tables?

>>
>> Browsing the code using webcvs, I have found this:
>>
>> for (j = 0; j < PQntuples(res); j++)
>> {
>> tbl_elem = DLGetHead(dbs->table_list);
>> while (tbl_elem != NULL)
>> {
>> Have I correctly understood what is going on here?


> Indeed you have. I have head a few similar reports but perhaps none as
> bad as yours. One person put a small sleep value so that it doesn't
> spin so tight. You could also just up the sleep delay so that it
> doesn't do this work quite so often. No other quick suggestions.


I do wonder why autovacuum is keeping its table list in memory rather
than in the database.

But given that it is keeping it in memory, I think the real fix is to
sort that list (or keep it ordered when building or updating it). It is
trivial to also get the query results ordered, and they can then be
compared in O(n) time.

I notice various other places where there seem to be nested loops, e.g.
in the update_table_list function. I'm not sure if they can be fixed by
similar means.

--Phil.








---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-08-2008, 09:18 PM
Matthew T. O'Connor
 
Posts: n/a
Default Re: CPU-intensive autovacuuming

Phil Endecott wrote:

> Matthew T. O'Connor wrote:
>
>> Indeed you have. I have head a few similar reports but perhaps none
>> as bad as yours. One person put a small sleep value so that it
>> doesn't spin so tight. You could also just up the sleep delay so
>> that it doesn't do this work quite so often. No other quick
>> suggestions.

>
>
> I do wonder why autovacuum is keeping its table list in memory rather
> than in the database.



For better or worse, this was a conscious design decision that the
contrib version of autovacuum be non-invasive to your database.

> But given that it is keeping it in memory, I think the real fix is to
> sort that list (or keep it ordered when building or updating it). It
> is trivial to also get the query results ordered, and they can then be
> compared in O(n) time.



I'm quite sure there is a better way, please submit a patch if you can.
This was never a real concern for most people since the number of tables
is typically small enough not to be a problem. The integrated version
of autovacuum that didn't make the cut before 8.0 avoids this problem
since the autovacuum data is stored in the database.

> I notice various other places where there seem to be nested loops,
> e.g. in the update_table_list function. I'm not sure if they can be
> fixed by similar means.



I would think so, they all basically do the same type of loop.


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-08-2008, 09:18 PM
Phil Endecott
 
Posts: n/a
Default Re: CPU-intensive autovacuuming

Matthew T. O'Connor wrote:
> The integrated version
> of autovacuum that didn't make the cut before 8.0 avoids this problem
> since the autovacuum data is stored in the database.


What is the status of this? Is it something that will be included in
8.1 or 8.0.n? I might be able to patch the current code but that
doesn't seem like a useful thing to do if a better solution will arrive
eventually. I am currently running vacuums from a cron job and I think
I will be happy with that for the time being.

(Incidentally, I have also found that the indexes on my pg_attributes
table were taking up over half a gigabyte, which came down to less than
40 megs after reindexing them. Is there a case for having autovacuum
also call reindex?)

--Phil.


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-08-2008, 09:19 PM
Matthew T. O'Connor
 
Posts: n/a
Default Re: CPU-intensive autovacuuming

Phil Endecott wrote:

> Matthew T. O'Connor wrote:
>
>> The integrated version of autovacuum that didn't make the cut before
>> 8.0 avoids this problem since the autovacuum data is stored in the
>> database.

>
>
> What is the status of this? Is it something that will be included in
> 8.1 or 8.0.n? I might be able to patch the current code but that
> doesn't seem like a useful thing to do if a better solution will
> arrive eventually. I am currently running vacuums from a cron job and
> I think I will be happy with that for the time being.



This is a good question :-) I have been so busy with work lately that I
have not been able to work on it. I am currently trying to resurrect
the patch I sent in for 8.0 and update it so that it applies against
HEAD. Once that is done, I will need help from someone with the
portions of the work that I'm not comfortable / capable of. The main
issue with the version I created during the 8.0 devel cycle it used
libpq to connect, query and issue commands against the databases. This
was deemed bad, and I need help setting up the infrastructure to make
this happen without libpq. I hope to have my patch applying against
HEAD sometime this week but it probably won't happen till next week.

So the summary of the autovacuum integration status is that we are fast
running out of time (feature freeze July 1), and I have very little time
to devote to this task. So you might want to submit your O(n) patch
cause unfortunately it looks like integrated autovacuum might slip
another release unless someone else steps up to work on it.


> (Incidentally, I have also found that the indexes on my pg_attributes
> table were taking up over half a gigabyte, which came down to less
> than 40 megs after reindexing them. Is there a case for having
> autovacuum also call reindex?)



Yes there is certainly some merit to having autovacuum or something
similar perform other system maintenance tasks such as reindexing. I
just haven't taken it there yet. It does seem strange that your
pg_attributes table go that big, anyone have any insight here? You did
say you are using 7.4.2, I forget it that has the index reclaiming code
in vacuum, also there are some autovacuum bugs in the early 7.4.x
releases. You might try to upgrade to either 8.0.x or a later 7.4.x
release.


Matthew O'Connor



---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-08-2008, 09:19 PM
Bruce Momjian
 
Posts: n/a
Default Re: CPU-intensive autovacuuming

Phil Endecott wrote:
> Matthew T. O'Connor wrote:
> > The integrated version
> > of autovacuum that didn't make the cut before 8.0 avoids this problem
> > since the autovacuum data is stored in the database.

>
> What is the status of this? Is it something that will be included in
> 8.1 or 8.0.n? I might be able to patch the current code but that
> doesn't seem like a useful thing to do if a better solution will arrive
> eventually. I am currently running vacuums from a cron job and I think
> I will be happy with that for the time being.


I will post about integrating pg_autovacuum into the backend for 8.1 in
a few minutes.

--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-08-2008, 09:19 PM
Tom Lane
 
Posts: n/a
Default Re: CPU-intensive autovacuuming

Phil Endecott <spam_from_postgresql_general@chezphil.org> writes:
> (Incidentally, I have also found that the indexes on my pg_attributes
> table were taking up over half a gigabyte, which came down to less than
> 40 megs after reindexing them. Is there a case for having autovacuum
> also call reindex?)


Lots of temp tables I suppose? If so that's not autovacuum's fault;
it wasn't getting told about the activity in pg_attribute until this
patch:

2005-03-31 18:20 tgl

* src/backend/postmaster/: pgstat.c (REL7_4_STABLE), pgstat.c
(REL8_0_STABLE), pgstat.c: Flush any remaining statistics counts
out to the collector at process exit. Without this, operations
triggered during backend exit (such as temp table deletions) won't
be counted ... which given heavy usage of temp tables can lead to
pg_autovacuum falling way behind on the need to vacuum pg_class and
pg_attribute. Per reports from Steve Crawford and others.

Unless the bloat occurred after you updated to 8.0.2, there's no issue.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-08-2008, 09:22 PM
Thomas F. O'Connell
 
Posts: n/a
Default Re: CPU-intensive autovacuuming

Phil,

If you complete this patch, I'm very interested to see it.

I think I'm the person Matthew is talking about who inserted a sleep
value. Because of the sheer number of tables involved, even small
values of sleep caused pg_autovacuum to iterate too slowly over its
table lists to be of use in a production environment (where I still
find its behavior to be preferable to a complicated list of manual
vacuums performed in cron).

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On Jun 7, 2005, at 6:16 AM, Phil Endecott wrote:

> Matthew T. O'Connor wrote:
>
>> Phil Endecott wrote:
>>
>>> > Could it be that there is some code in autovacuum that is O
>>> (n^2) in
>>> > the number of tables?
>>>
>>> Browsing the code using webcvs, I have found this:
>>>
>>> for (j = 0; j < PQntuples(res); j++)
>>> {
>>> tbl_elem = DLGetHead(dbs->table_list);
>>> while (tbl_elem != NULL)
>>> { Have I correctly understood what is going on here?
>>>

>
>
>> Indeed you have. I have head a few similar reports but perhaps
>> none as bad as yours. One person put a small sleep value so that
>> it doesn't spin so tight. You could also just up the sleep delay
>> so that it doesn't do this work quite so often. No other quick
>> suggestions.
>>

>
> I do wonder why autovacuum is keeping its table list in memory
> rather than in the database.
>
> But given that it is keeping it in memory, I think the real fix is
> to sort that list (or keep it ordered when building or updating
> it). It is trivial to also get the query results ordered, and they
> can then be compared in O(n) time.
>
> I notice various other places where there seem to be nested loops,
> e.g. in the update_table_list function. I'm not sure if they can
> be fixed by similar means.
>
> --Phil.


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-08-2008, 09:23 PM
Shelby Cain
 
Posts: n/a
Default Re: CPU-intensive autovacuuming



--- "Thomas F. O'Connell" <tfo@sitening.com> wrote:

> Phil,
>
> If you complete this patch, I'm very interested to see it.
>
> I think I'm the person Matthew is talking about who inserted a sleep
>
> value. Because of the sheer number of tables involved, even small
> values of sleep caused pg_autovacuum to iterate too slowly over its
> table lists to be of use in a production environment (where I still
> find its behavior to be preferable to a complicated list of manual
> vacuums performed in cron).
>
> --
> Thomas F. O'Connell
> Co-Founder, Information Architect
> Sitening, LLC
>


Were you sleeping every time through the loop? How about something
like:

if (j%500 == 1) usleep(100000)

Regards,

Shelby Cain



__________________________________
Discover Yahoo!
Stay in touch with email, IM, photo sharing and more. Check it out!
http://discover.yahoo.com/stayintouch.html

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 12:49 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com