Unix Technical Forum

Re: [PATCHES] Restartable Recovery

This is a discussion on Re: [PATCHES] Restartable Recovery within the pgsql Hackers forums, part of the PostgreSQL category; --> Simon Riggs <simon@2ndquadrant.com> writes: > On Sun, 2006-07-16 at 10:51 -0400, Tom Lane wrote: >> Ouch. That's a bit ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 04:29 AM
Tom Lane
 
Posts: n/a
Default Re: [PATCHES] Restartable Recovery

Simon Riggs <simon@2ndquadrant.com> writes:
> On Sun, 2006-07-16 at 10:51 -0400, Tom Lane wrote:
>> Ouch. That's a bit nasty. You can't just apply a postponed split at
>> checkpoint time, because the WAL record could easily be somewhere after
>> the checkpoint, leading to duplicate insertions.


> To do this we would need to have another rmgr specific routine that gets
> called at a recovery checkpoint. This would then write to disk the
> current state of the incomplete multi-WAL actions, in some manner.
> During the startup routines we would check for any pre-existing state
> files and use those to initialise the incomplete action cache. Cleanup
> would then discard all state files.


I thought about that too, but it seems very messy, eg you'd have to
actually fsync the state files to be sure they were safely down to disk.
Another problem is that WAL records between the checkpoint's REDO point
and the physical checkpoint location could get replayed twice, leading
to duplicate entries in the rmgr's state. Consider a split start WAL
entry located in that range, with the split completion entry after the
checkpoint --- on restart, we'd load a pending-split entry from the
state file and then create another one on seeing the split-start record
again.

A compromise that might be good enough is to add an rmgr routine defined
as "bool is_idle(void)" that tests whether the rmgr has any open state
to worry about. Then, recovery checkpoints are done only if all rmgrs
say they are idle. That is, we only checkpoint if there is not a need
for any state files. At least for btree's usage, this should be all
right since the "split pending" state is short-lived and so most of the
time we'd not need to skip checkpoints. I'm not totally sure about GIST
or GIN though (Teodor?).

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-12-2008, 04:29 AM
Simon Riggs
 
Posts: n/a
Default Re: [PATCHES] Restartable Recovery

On Sun, 2006-07-16 at 12:40 -0400, Tom Lane wrote:

> A compromise that might be good enough is to add an rmgr routine defined
> as "bool is_idle(void)" that tests whether the rmgr has any open state
> to worry about. Then, recovery checkpoints are done only if all rmgrs
> say they are idle.


Like it.

> That is, we only checkpoint if there is not a need
> for any state files. At least for btree's usage, this should be all
> right since the "split pending" state is short-lived and so most of the
> time we'd not need to skip checkpoints. I'm not totally sure about GIST
> or GIN though (Teodor?).


Considering how infrequently we wanted to do recovery checkpoints, this
is unlikely to cause any issue. But in any case, this is the best we can
give people, rather than a compromise.

Perhaps that should be extended to say whether there are any
non-idempotent changes made in the last checkpoint period. That might
cover a wider set of potential actions.

If index splits in GIST or GIN are *not* short lived, then I would
imagine we'd have some serious contention problems to clear up since an
inconsistent index is unusable and would require portions of it to be
locked throughout such operations to ensure their atomicity.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-12-2008, 04:29 AM
Tom Lane
 
Posts: n/a
Default Re: [PATCHES] Restartable Recovery

Simon Riggs <simon@2ndquadrant.com> writes:
> On Sun, 2006-07-16 at 12:40 -0400, Tom Lane wrote:
>> A compromise that might be good enough is to add an rmgr routine defined
>> as "bool is_idle(void)" that tests whether the rmgr has any open state
>> to worry about. Then, recovery checkpoints are done only if all rmgrs
>> say they are idle.


> Perhaps that should be extended to say whether there are any
> non-idempotent changes made in the last checkpoint period. That might
> cover a wider set of potential actions.


Perhaps best to call it safe_to_checkpoint(), and not pre-judge what
reasons the rmgr might have for not wanting to restart here.

If we are only going to do a recovery checkpoint at every Nth checkpoint
record, then occasionally having to skip one seems no big problem ---
just do it at the first subsequent record that is safe.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-12-2008, 04:29 AM
Simon Riggs
 
Posts: n/a
Default Re: [PATCHES] Restartable Recovery

On Sun, 2006-07-16 at 15:33 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Sun, 2006-07-16 at 12:40 -0400, Tom Lane wrote:
> >> A compromise that might be good enough is to add an rmgr routine defined
> >> as "bool is_idle(void)" that tests whether the rmgr has any open state
> >> to worry about. Then, recovery checkpoints are done only if all rmgrs
> >> say they are idle.

>
> > Perhaps that should be extended to say whether there are any
> > non-idempotent changes made in the last checkpoint period. That might
> > cover a wider set of potential actions.

>
> Perhaps best to call it safe_to_checkpoint(), and not pre-judge what
> reasons the rmgr might have for not wanting to restart here.


You read my mind.

> If we are only going to do a recovery checkpoint at every Nth checkpoint
> record, then occasionally having to skip one seems no big problem ---
> just do it at the first subsequent record that is safe.


Got it.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 08:36 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com