Unix Technical Forum

Load Distributed Checkpoints, final patch

This is a discussion on Load Distributed Checkpoints, final patch within the Pgsql Patches forums, part of the PostgreSQL category; --> Here's latest revision of Itagaki-sans Load Distributed Checkpoints patch: * bgwriter all-scan is gone. We might or might not ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Patches

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 11:18 AM
Heikki Linnakangas
 
Posts: n/a
Default Load Distributed Checkpoints, final patch

Here's latest revision of Itagaki-sans Load Distributed Checkpoints patch:

* bgwriter all-scan is gone. We might or might not improve the LRU-sweep
later so that it can perform any duties the all-sweep might have had
besides reducing the impact of a checkpoint.

* one new GUC variable, called checkpoint_completion_target. Default is
0.5, which should be more than enough to smooth checkpoints on a system
that's not already overloaded. It's also not too large to hurt recovery
times too much on a system that's not already struggling to hit its
recovery time requirements. You can set it to 0 if you want the old
checkpoint behavior for some reason. Maximum is 0.9, to leave some
headroom for fsync and any other things that need to happen during a
checkpoint.

* The minimum rate we write at during a checkpoint is 1 page /
bgwriter_delay.

* Added a paragraph to user manual to describe the feature. Also updated
the formula for expected number of WAL segments, new formula is (2 +
checkpoint_completion_target) * checkpoint_segments + 1. I believe the
comments in xlog.c regarding XLOGfileslop are still valid.

* The signaling in bgwriter.c is based on a spinlock. Tom advised to not
use the spinlock when not strictly necessary, but IMHO it's easier to
understand this way. Feel free to revert that when committing if you
disagree.

* The algorithm for estimating progress wrt. checkpoint_segments is the
same as before. Bursty WAL activity will lead to bursty checkpoint
activity, but I wanted to keep it simple for now. In any case, the I/O
rate will be smoother than without the patch.

* There's some DEBUG elogs which we might want to replace with better
ones later, per the patch in the patch queue by Greg Smith. The ones
that are there now are useful for testing this feature, but are a bit
crude for DBAs to use.

Barring any objections from committer, I'm finished with this patch.

I'm scheduling more DBT-2 tests at a high # of warehouses per Greg
Smith's suggestion just to see what happens, but I doubt that will
change my mind on the above decisions.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-18-2008, 11:18 AM
Michael Glaesemann
 
Posts: n/a
Default Re: Load Distributed Checkpoints, final patch


On Jun 26, 2007, at 13:49 , Heikki Linnakangas wrote:

> Maximum is 0.9, to leave some headroom for fsync and any other
> things that need to happen during a checkpoint.


I think it might be more user-friendly to make the maximum 1 (meaning
as much smoothing as you could possibly get) and internally reserve a
certain amount off for whatever headroom might be required. It's more
common for users to see a value range from 0 to 1 rather than 0 to 0.9.

Michael Glaesemann
grzm seespotcode net



---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-18-2008, 11:18 AM
Tom Lane
 
Posts: n/a
Default Re: Load Distributed Checkpoints, final patch

Heikki Linnakangas <heikki@enterprisedb.com> writes:
> Barring any objections from committer, I'm finished with this patch.


Sounds great, I'll start looking this over.

> I'm scheduling more DBT-2 tests at a high # of warehouses per Greg
> Smith's suggestion just to see what happens, but I doubt that will
> change my mind on the above decisions.


When do you expect to have those results?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-18-2008, 11:18 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: Load Distributed Checkpoints, final patch

Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> Barring any objections from committer, I'm finished with this patch.

>
> Sounds great, I'll start looking this over.
>
>> I'm scheduling more DBT-2 tests at a high # of warehouses per Greg
>> Smith's suggestion just to see what happens, but I doubt that will
>> change my mind on the above decisions.

>
> When do you expect to have those results?


In a few days. I'm doing long tests because the variability in the 1h
tests was very high.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-18-2008, 11:18 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: Load Distributed Checkpoints, final patch

Michael Glaesemann wrote:
>
> On Jun 26, 2007, at 13:49 , Heikki Linnakangas wrote:
>
>> Maximum is 0.9, to leave some headroom for fsync and any other things
>> that need to happen during a checkpoint.

>
> I think it might be more user-friendly to make the maximum 1 (meaning as
> much smoothing as you could possibly get) and internally reserve a
> certain amount off for whatever headroom might be required. It's more
> common for users to see a value range from 0 to 1 rather than 0 to 0.9.


It would then be counter-intuitive if you set it to 1.0, and see that
your checkpoints consistently take 90% of the checkpoint interval.

We could just allow any value up to 1.0, and note in the docs that you
should leave some headroom, unless you don't mind starting the next
checkpoint a bit late. That actually sounds pretty good.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-18-2008, 11:18 AM
Gregory Stark
 
Posts: n/a
Default Re: Load Distributed Checkpoints, final patch


"Heikki Linnakangas" <heikki@enterprisedb.com> writes:

> We could just allow any value up to 1.0, and note in the docs that you should
> leave some headroom, unless you don't mind starting the next checkpoint a bit
> late. That actually sounds pretty good.


What exactly happens if a checkpoint takes so long that the next checkpoint
starts. Aside from it not actually helping is there much reason to avoid this
situation? Have we ever actually tested it?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-18-2008, 11:18 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: Load Distributed Checkpoints, final patch

Gregory Stark wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
>
>> We could just allow any value up to 1.0, and note in the docs that you should
>> leave some headroom, unless you don't mind starting the next checkpoint a bit
>> late. That actually sounds pretty good.

>
> What exactly happens if a checkpoint takes so long that the next checkpoint
> starts. Aside from it not actually helping is there much reason to avoid this
> situation?


Not really. We might run out of preallocated WAL segments, and will have
to create more. Recovery could be longer than expected since the real
checkpoint interval ends up being longer, but you can't make very
accurate recovery time estimations anyway.

> Have we ever actually tested it?


I haven't.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-18-2008, 11:18 AM
Greg Smith
 
Posts: n/a
Default Re: Load Distributed Checkpoints, final patch

On Tue, 26 Jun 2007, Gregory Stark wrote:

> What exactly happens if a checkpoint takes so long that the next checkpoint
> starts. Aside from it not actually helping is there much reason to avoid this
> situation? Have we ever actually tested it?


More segments get created, and because of how they are cleared at the
beginning this causes its own mini-I/O storm through the same buffered
write channel the checkpoint writes are going into (which way or may not
be the same way normal WAL writes go, depending on whether you're using
O_[D]SYNC WAL writes). I've seen some weird and intermittant breakdowns
from the contention that occurs when this happens, and it's certainly
something to be avoided.

To test it you could just use a big buffer cache, write like mad to it,
and make checkpoint_segments smaller than it should be for that workload.
It's easy enough to kill yourself exactly this way right now though, and
the fact that LDC gives you a parameter to aim this particular foot-gun
more precisely isn't a big deal IMHO as long as the documentation is
clear.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-18-2008, 11:18 AM
Tom Lane
 
Posts: n/a
Default Re: Load Distributed Checkpoints, final patch

Heikki Linnakangas <heikki@enterprisedb.com> writes:
> We could just allow any value up to 1.0, and note in the docs that you
> should leave some headroom, unless you don't mind starting the next
> checkpoint a bit late. That actually sounds pretty good.


Yeah, that sounds fine. There isn't actually any harm in starting a
checkpoint later than otherwise expected, is there? The worst
consequence I can think of is a backend having to take time to
manufacture a new xlog segment, because we didn't finish a checkpoint
in time to recycle old ones. This might be better handled by allowing
a bit more slop in the number of recycled-into-the-future xlog segments.

Come to think of it, shouldn't we be allowing some extra slop in the
number of future segments to account for xlog archiving delays, when
that's enabled?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-18-2008, 11:18 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: Load Distributed Checkpoints, final patch

Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> We could just allow any value up to 1.0, and note in the docs that you
>> should leave some headroom, unless you don't mind starting the next
>> checkpoint a bit late. That actually sounds pretty good.

>
> Yeah, that sounds fine. There isn't actually any harm in starting a
> checkpoint later than otherwise expected, is there? The worst
> consequence I can think of is a backend having to take time to
> manufacture a new xlog segment, because we didn't finish a checkpoint
> in time to recycle old ones. This might be better handled by allowing
> a bit more slop in the number of recycled-into-the-future xlog segments.
>
> Come to think of it, shouldn't we be allowing some extra slop in the
> number of future segments to account for xlog archiving delays, when
> that's enabled?


XLogFileSlop is currently 2 * checkpoint_segments + 1 since last
checkpoint, which is just enough to accommodate a checkpoint that lasts
the full checkpoint interval. If we want to keep as much "slop" there as
before, then yes that should be increased to (2 +
checkpoint_completion_target) * checkpoint_segments + 1, or just 3 *
checkpoint_segments if we want to keep it simple.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 04:45 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com