This is a discussion on Load Distributed Checkpoints, final patch within the Pgsql Patches forums, part of the PostgreSQL category; --> Here's latest revision of Itagaki-sans Load Distributed Checkpoints patch: * bgwriter all-scan is gone. We might or might not ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Here's latest revision of Itagaki-sans Load Distributed Checkpoints patch: * bgwriter all-scan is gone. We might or might not improve the LRU-sweep later so that it can perform any duties the all-sweep might have had besides reducing the impact of a checkpoint. * one new GUC variable, called checkpoint_completion_target. Default is 0.5, which should be more than enough to smooth checkpoints on a system that's not already overloaded. It's also not too large to hurt recovery times too much on a system that's not already struggling to hit its recovery time requirements. You can set it to 0 if you want the old checkpoint behavior for some reason. Maximum is 0.9, to leave some headroom for fsync and any other things that need to happen during a checkpoint. * The minimum rate we write at during a checkpoint is 1 page / bgwriter_delay. * Added a paragraph to user manual to describe the feature. Also updated the formula for expected number of WAL segments, new formula is (2 + checkpoint_completion_target) * checkpoint_segments + 1. I believe the comments in xlog.c regarding XLOGfileslop are still valid. * The signaling in bgwriter.c is based on a spinlock. Tom advised to not use the spinlock when not strictly necessary, but IMHO it's easier to understand this way. Feel free to revert that when committing if you disagree. * The algorithm for estimating progress wrt. checkpoint_segments is the same as before. Bursty WAL activity will lead to bursty checkpoint activity, but I wanted to keep it simple for now. In any case, the I/O rate will be smoother than without the patch. * There's some DEBUG elogs which we might want to replace with better ones later, per the patch in the patch queue by Greg Smith. The ones that are there now are useful for testing this feature, but are a bit crude for DBAs to use. Barring any objections from committer, I'm finished with this patch. I'm scheduling more DBT-2 tests at a high # of warehouses per Greg Smith's suggestion just to see what happens, but I doubt that will change my mind on the above decisions. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| On Jun 26, 2007, at 13:49 , Heikki Linnakangas wrote: > Maximum is 0.9, to leave some headroom for fsync and any other > things that need to happen during a checkpoint. I think it might be more user-friendly to make the maximum 1 (meaning as much smoothing as you could possibly get) and internally reserve a certain amount off for whatever headroom might be required. It's more common for users to see a value range from 0 to 1 rather than 0 to 0.9. Michael Glaesemann grzm seespotcode net ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Heikki Linnakangas <heikki@enterprisedb.com> writes: > Barring any objections from committer, I'm finished with this patch. Sounds great, I'll start looking this over. > I'm scheduling more DBT-2 tests at a high # of warehouses per Greg > Smith's suggestion just to see what happens, but I doubt that will > change my mind on the above decisions. When do you expect to have those results? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: >> Barring any objections from committer, I'm finished with this patch. > > Sounds great, I'll start looking this over. > >> I'm scheduling more DBT-2 tests at a high # of warehouses per Greg >> Smith's suggestion just to see what happens, but I doubt that will >> change my mind on the above decisions. > > When do you expect to have those results? In a few days. I'm doing long tests because the variability in the 1h tests was very high. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Michael Glaesemann wrote: > > On Jun 26, 2007, at 13:49 , Heikki Linnakangas wrote: > >> Maximum is 0.9, to leave some headroom for fsync and any other things >> that need to happen during a checkpoint. > > I think it might be more user-friendly to make the maximum 1 (meaning as > much smoothing as you could possibly get) and internally reserve a > certain amount off for whatever headroom might be required. It's more > common for users to see a value range from 0 to 1 rather than 0 to 0.9. It would then be counter-intuitive if you set it to 1.0, and see that your checkpoints consistently take 90% of the checkpoint interval. We could just allow any value up to 1.0, and note in the docs that you should leave some headroom, unless you don't mind starting the next checkpoint a bit late. That actually sounds pretty good. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| "Heikki Linnakangas" <heikki@enterprisedb.com> writes: > We could just allow any value up to 1.0, and note in the docs that you should > leave some headroom, unless you don't mind starting the next checkpoint a bit > late. That actually sounds pretty good. What exactly happens if a checkpoint takes so long that the next checkpoint starts. Aside from it not actually helping is there much reason to avoid this situation? Have we ever actually tested it? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Gregory Stark wrote: > "Heikki Linnakangas" <heikki@enterprisedb.com> writes: > >> We could just allow any value up to 1.0, and note in the docs that you should >> leave some headroom, unless you don't mind starting the next checkpoint a bit >> late. That actually sounds pretty good. > > What exactly happens if a checkpoint takes so long that the next checkpoint > starts. Aside from it not actually helping is there much reason to avoid this > situation? Not really. We might run out of preallocated WAL segments, and will have to create more. Recovery could be longer than expected since the real checkpoint interval ends up being longer, but you can't make very accurate recovery time estimations anyway. > Have we ever actually tested it? I haven't. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| On Tue, 26 Jun 2007, Gregory Stark wrote: > What exactly happens if a checkpoint takes so long that the next checkpoint > starts. Aside from it not actually helping is there much reason to avoid this > situation? Have we ever actually tested it? More segments get created, and because of how they are cleared at the beginning this causes its own mini-I/O storm through the same buffered write channel the checkpoint writes are going into (which way or may not be the same way normal WAL writes go, depending on whether you're using O_[D]SYNC WAL writes). I've seen some weird and intermittant breakdowns from the contention that occurs when this happens, and it's certainly something to be avoided. To test it you could just use a big buffer cache, write like mad to it, and make checkpoint_segments smaller than it should be for that workload. It's easy enough to kill yourself exactly this way right now though, and the fact that LDC gives you a parameter to aim this particular foot-gun more precisely isn't a big deal IMHO as long as the documentation is clear. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Heikki Linnakangas <heikki@enterprisedb.com> writes: > We could just allow any value up to 1.0, and note in the docs that you > should leave some headroom, unless you don't mind starting the next > checkpoint a bit late. That actually sounds pretty good. Yeah, that sounds fine. There isn't actually any harm in starting a checkpoint later than otherwise expected, is there? The worst consequence I can think of is a backend having to take time to manufacture a new xlog segment, because we didn't finish a checkpoint in time to recycle old ones. This might be better handled by allowing a bit more slop in the number of recycled-into-the-future xlog segments. Come to think of it, shouldn't we be allowing some extra slop in the number of future segments to account for xlog archiving delays, when that's enabled? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| ||||
| Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: >> We could just allow any value up to 1.0, and note in the docs that you >> should leave some headroom, unless you don't mind starting the next >> checkpoint a bit late. That actually sounds pretty good. > > Yeah, that sounds fine. There isn't actually any harm in starting a > checkpoint later than otherwise expected, is there? The worst > consequence I can think of is a backend having to take time to > manufacture a new xlog segment, because we didn't finish a checkpoint > in time to recycle old ones. This might be better handled by allowing > a bit more slop in the number of recycled-into-the-future xlog segments. > > Come to think of it, shouldn't we be allowing some extra slop in the > number of future segments to account for xlog archiving delays, when > that's enabled? XLogFileSlop is currently 2 * checkpoint_segments + 1 since last checkpoint, which is just enough to accommodate a checkpoint that lasts the full checkpoint interval. If we want to keep as much "slop" there as before, then yes that should be increased to (2 + checkpoint_completion_target) * checkpoint_segments + 1, or just 3 * checkpoint_segments if we want to keep it simple. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |