Unix Technical Forum

WAL recovery, stop and resume recovery?

This is a discussion on WAL recovery, stop and resume recovery? within the pgsql Admins forums, part of the PostgreSQL category; --> Using PG 8.2, I have a database in recovery mode using pg_standby to handle the WAL restores. Is it ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Admins

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-10-2008, 09:18 AM
David Wall
 
Posts: n/a
Default WAL recovery, stop and resume recovery?

Using PG 8.2, I have a database in recovery mode using pg_standby to
handle the WAL restores.

Is it allowable to have a backup database in recovery mode, then stop
recovery (in this case, by putting the trigger file in place to stop
pg_standby), check out that the backup db appears up to date, stop the
now active backup db, and then restart it in recover mode again to have
it resume its backup role?

I have had some success doing this, with the restart in recovery showing:

LOG: starting archive recovery
LOG: restore_command = "~/postgresql/bin/pg_standby -l -d -s 2 -k 20 -t
~/postgresql/restoreWALs/STOP_RESTORE ~/postgresql/restoreWALs %f %p 2>>
~/pg_standby.log"
LOG: restored log file "000000010000000500000018" from archive
*LOG: invalid xl_info in primary checkpoint record*
LOG: using previous checkpoint record at 5/18000020
LOG: redo record is at 5/18000020; undo record is at 0/0; shutdown FALSE
LOG: next transaction ID: 0/1535389; next OID: 53990
LOG: next MultiXactId: 1; next MultiXactOffset: 0
*LOG: automatic recovery in progress*
LOG: redo starts at 5/18000068

But there are times when I do this that it cannot. Is this because the
steps are an issue (after all, I did stop recovery and go active
briefly, though I didn't update the db during that time, just did \d and
select queries to see that DDL and row data were updated on the backup),
or is it related to not keeping enough WAL files around (pg_standby -k
20 was chosen, but it's not clear how to select this value, and it
sounds like 8.3 gets rid of that issue entirely) to find the 'secondary
checkpoint record'.

Here's the sort of error I get when it doesn't allow me to restart:

LOG: database system was shut down at 2008-01-11 11:40:05 PST
LOG: starting archive recovery
LOG: restore_command = "~/postgresql/bin/pg_standby -l -d -s 2 -k 20 -t
~/postgresql/restoreWALs/STOP_RESTORE ~/postgresql/restoreWALs %f %p 2>>
~/pg_standby.log"
*LOG: restored log file "00000001000000050000001D" from archive
LOG: invalid record length at 5/1D000068
LOG: invalid primary checkpoint record
LOG: restored log file "00000001000000050000001D" from archive
LOG: invalid resource manager ID in secondary checkpoint record
PANIC: could not locate a valid checkpoint record*
LOG: startup process (PID 9219) was terminated by signal 6
LOG: aborting startup due to startup process failure
LOG: logger shutting down


Thanks,
David

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-10-2008, 09:18 AM
David Wall
 
Posts: n/a
Default Re: WAL recovery, stop and resume recovery?


> LOG: database system was shut down at 2008-01-11 11:40:05 PST
> LOG: starting archive recovery
> LOG: restore_command = "~/postgresql/bin/pg_standby -l -d -s 2 -k 20
> -t ~/postgresql/restoreWALs/STOP_RESTORE ~/postgresql/restoreWALs %f
> %p 2>> ~/pg_standby.log"
> *LOG: restored log file "00000001000000050000001D" from archive
> LOG: invalid record length at 5/1D000068
> LOG: invalid primary checkpoint record
> LOG: restored log file "00000001000000050000001D" from archive
> LOG: invalid resource manager ID in secondary checkpoint record
> PANIC: could not locate a valid checkpoint record*
> LOG: startup process (PID 9219) was terminated by signal 6
> LOG: aborting startup due to startup process failure
> LOG: logger shutting down


One more thing is that the error above about the 1D log file above
having an invalid record length, etc. is interesting in that if I
restore from the TAR backup created for this and start the backup
database in recover mode, it manages to process the 1D log file just
fine (along with all the previous log files of course).

David

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-10-2008, 09:18 AM
Tom Lane
 
Posts: n/a
Default Re: WAL recovery, stop and resume recovery?

David Wall <d.wall@computer.org> writes:
> Is it allowable to have a backup database in recovery mode, then stop
> recovery (in this case, by putting the trigger file in place to stop
> pg_standby), check out that the backup db appears up to date, stop the
> now active backup db, and then restart it in recover mode again to have
> it resume its backup role?


No. Once you've done any transactions in the backup DB, its transaction
history has diverged from the master and you can't resume tracking the
master. It shouldn't even let you try --- what shenanigans did you pull
to force it back into recovery mode?

There's some work being done on allowing read-only queries against an
in-recovery database, which I think would satisfy your desire to see if
the backup were sane or not. But I wouldn't bet money on that getting
into the system anytime soon. It's definitely not something you can
cobble up from spare parts.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-10-2008, 09:18 AM
David Wall
 
Posts: n/a
Default Re: WAL recovery, stop and resume recovery?


> No. Once you've done any transactions in the backup DB, its transaction
> history has diverged from the master and you can't resume tracking the
> master. It shouldn't even let you try --- what shenanigans did you pull
> to force it back into recovery mode?
>

Well, I didn't think it was shenanigans, I just stopped the database
once it completed the first recovery, ran a few queries, then
re-installed the recovery.conf and started it back up like I initially
did. I figured this could be an issue, but since I hadn't issued any
changes, I had hoped it might work.
> There's some work being done on allowing read-only queries against an
> in-recovery database, which I think would satisfy your desire to see if
> the backup were sane or not. But I wouldn't bet money on that getting
> into the system anytime soon. It's definitely not something you can
> cobble up from spare parts.
>

Fair enough. It's probably not a big deal as I'm doing this only
because we're new to using WAL copying for a warm standby, and of course
we're testing to see that rows inserted, removed, updated, tables added
and dropped, indexes added and dropped, etc. are all making it through.
It appears that this works like a charm!

Is there a way to know how many WAL files I should keep around to ensure
I can recover back to a valid primary checkpoint without having to redo
the entire backup process on the primary in 8.2, or do I just have to
wait for 8.3 and %r option for recovery?

Thanks,
David


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 02:49 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com