Unix Technical Forum

synchronized scans for VACUUM

This is a discussion on synchronized scans for VACUUM within the pgsql Hackers forums, part of the PostgreSQL category; --> Previous thread for reference: http://archives.postgresql.org/pgsql...6/msg00096.php The objections to synchronized scans for VACUUM as listed in that thread (summary): 1. ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 06-02-2008, 12:36 PM
Jeff Davis
 
Posts: n/a
Default synchronized scans for VACUUM

Previous thread for reference:

http://archives.postgresql.org/pgsql...6/msg00096.php

The objections to synchronized scans for VACUUM as listed in that thread
(summary):

1. vacuum sometimes progresses faster than a regular heapscan, because
it doesn't need to check WHERE clauses, etc.

2. vacuum takes breaks from the scan to clean up the indexes when it
runs out of maintenance_work_mem.

3. vacuum takes breaks for the cost delay

4. vacuum will dirty a lot of the blocks as it goes, and that will cause
some kind of interaction with the ring buffer

I'd like to address these one by one to see what problems are really in
our way:

1. This would mean that it's not an I/O limited scan. I think as long as
we're talking about regular table scans that can benefit from
synchronized scanning, a vacuum of the same table would also benefit. A
microbenchmark could show whether some benefit exists or not.

2. There have been suggestions about a more compact representation for
the tuple id list. If this works, it will solve this problem.

3. Offering synchronized vacuums could reduce the need for these
elective pauses.

4. This probably has more to do with the buffer ring than synchronized
scans. There could be some bad interaction there, but I don't see that
it's clearly bad.

Additionally, with the possible exception of #4, I don't see the
situation being worse than it is currently.

Thoughts?

Regards,
Jeff Davis


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 06-02-2008, 12:36 PM
Tom Lane
 
Posts: n/a
Default Re: synchronized scans for VACUUM

Jeff Davis <pgsql@j-davis.com> writes:
> The objections to synchronized scans for VACUUM as listed in that thread
> (summary):


> 2. vacuum takes breaks from the scan to clean up the indexes when it
> runs out of maintenance_work_mem.


> 2. There have been suggestions about a more compact representation for
> the tuple id list. If this works, it will solve this problem.


It will certainly not "solve" the problem. What it will do is mean that
the breaks are further apart and longer, which seems to me to make the
conflict with syncscan behavior worse not better.

> 3. vacuum takes breaks for the cost delay


> 3. Offering synchronized vacuums could reduce the need for these
> elective pauses.


How so? A vacuum that happens not to be part of a syncscan herd is
going to be just as bad for system performance as ever.


It still seems to me that vacuum is unlikely to be a productive member
of a syncscan herd --- it just isn't going to have similar scan-speed
behavior to typical queries.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 06-02-2008, 12:36 PM
Gregory Stark
 
Posts: n/a
Default Re: synchronized scans for VACUUM

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Jeff Davis <pgsql@j-davis.com> writes:
>> The objections to synchronized scans for VACUUM as listed in that thread
>> (summary):

>
>> 2. vacuum takes breaks from the scan to clean up the indexes when it
>> runs out of maintenance_work_mem.

>
>> 2. There have been suggestions about a more compact representation for
>> the tuple id list. If this works, it will solve this problem.

>
> It will certainly not "solve" the problem. What it will do is mean that
> the breaks are further apart and longer, which seems to me to make the
> conflict with syncscan behavior worse not better.


How would it make them longer? They still have the same amount of i/o to do
scanning the indexes. I suppose they would dirty more pages which might slow
them down?

In any case I think the representation you proposed back when this idea last
came up was so compact that pretty much any size table ought to be
representable in a reasonable work_mem -- at least for the kind of machine
which would normally be dealing with that size table.

> It still seems to me that vacuum is unlikely to be a productive member
> of a syncscan herd --- it just isn't going to have similar scan-speed
> behavior to typical queries.


That's my thinking too. Our general direction has been toward reducing
vacuum's i/o bandwidth requirements, not worrying about making it run as fast
as possible.

That said if it happened to latch on to a sync scan herd it would have very
few cache misses which would cause it to rack up very few vacuum cost delay
points. Perhaps the vacuum cost delay for a cache hit ought to be 0?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 06-02-2008, 12:36 PM
Tom Lane
 
Posts: n/a
Default Re: synchronized scans for VACUUM

Gregory Stark <stark@enterprisedb.com> writes:
>> It will certainly not "solve" the problem. What it will do is mean that
>> the breaks are further apart and longer, which seems to me to make the
>> conflict with syncscan behavior worse not better.


> How would it make them longer? They still have the same amount of i/o to do
> scanning the indexes. I suppose they would dirty more pages which might slow
> them down?


More tuples to delete = more writes (in WAL, if not immediately in the
index itself) = longer to complete the indexscan. It's still cheaper
than doing multiple indexscans, of course, but my point is that the
index-fixing work gets concentrated.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 10:51 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com