Unix Technical Forum

archive_command

This is a discussion on archive_command within the pgsql Admins forums, part of the PostgreSQL category; --> On Mon, 2005-10-03 at 20:00 -0700, Jeff Frost wrote: > On Sun, 2 Oct 2005, Simon Riggs wrote: > ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Admins

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #11 (permalink)  
Old 04-10-2008, 05:19 AM
Simon Riggs
 
Posts: n/a
Default Re: archive_command

On Mon, 2005-10-03 at 20:00 -0700, Jeff Frost wrote:
> On Sun, 2 Oct 2005, Simon Riggs wrote:
>
> > Probably the best idea is to backup the last WAL file for each timeline
> > seen. Keep track of that, so when the current file changes you'll know
> > which timeline you're on and stick to that from then on. Or more simply,
> > put some notes with your program saying "if you ever use a complex
> > recovery situation, make sure to clear all archive_status files for
> > higher timeline ids before using this program".

>
> Tell me if you think this is the most reasonable way to determine the in use
> WAL file:
>
> ls -rt $PGDATA/pg_xlog/ | grep -v "backup\|archive\|RECOVERY" | tail -1


If you trust the times your filesystem hands you. PostgreSQL doesn't use
the file times it uses the sequential number naming, so for paranoid
accuracy, this should too so perhaps take the t off.

> > This would not be necessary had I completed my logswitch patch in time
> > for 8.1 freeze, but I regret that I was unable to do that.

>
> What's the logswitch patch going to accomplish?


Nothing until I finish it. :-)

> Also, I have an interesting and unrelated question...this past weekend, we had
> a failure in which we had to restore from backup. I have created a script
> which makes a base backup every two weeks and we keep the last two. Also, we
> use PITR for replication, making a base backup between the primary and
> secondary server every 8 hours, and running my rsynclastlog script once/minute
> to keep as up to date as possible.


Sounds like a good setup.

> Now here's the problem...the servers
> switched roles on Sep 21. We switched them back a little while after that.
> When I went to replay the WAL files using the Sep 15 base backup, it happily
> played back the WAL files to Sep 21 and stopped. I presume that this means
> whenever the servers switch roles I need to create a fresh base backup?


Yes, but more generally if I lost one node I would always be inclined to
fully backup the remaining one just in case.

> Is there any possible way to replay the other WAL files after the Sep 21
> switchover/switchback? I'm going to guess this is similar to having another
> timeline.


That may be your exact case, it depends upon how you did recovery. Both
systems think they are the same one, so you need to be careful.

Look for the last log file of the went-down node. That file should exist
twice, once as finally written by the went-down node and once as
continued to be written to by the stayed-up node after switchover. It
sounds like the wrong file was used to recover with and as a result
stopped recovery on Sep 21. But if you're running with the system now,
you'll need to do an intermediate rebuild and recover the data manually.

(I recommend testing recovery procedures before you go live and then at
least every 6 months, to ensure that they actually still work when you
need 'em. Trying to do a recovery with any confidence is not easy at 4am
on a Sunday morning under maximum stress, from experience.)

> We restored from a nightly pg_dump which we have kept doing, but that means we
> lost about 20 hrs of data. :-(


But sounds recoverable.

Best Regards, Simon Riggs


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #12 (permalink)  
Old 04-10-2008, 05:19 AM
Gaetano Mendola
 
Posts: n/a
Default Re: archive_command

Jeff Frost wrote:
> On Sun, 2 Oct 2005, Simon Riggs wrote:
>
>> Probably the best idea is to backup the last WAL file for each timeline
>> seen. Keep track of that, so when the current file changes you'll know
>> which timeline you're on and stick to that from then on. Or more simply,
>> put some notes with your program saying "if you ever use a complex
>> recovery situation, make sure to clear all archive_status files for
>> higher timeline ids before using this program".

>
> Tell me if you think this is the most reasonable way to determine the in
> use WAL file:
>
> ls -rt $PGDATA/pg_xlog/ | grep -v "backup\|archive\|RECOVERY" | tail -1
>


Look at this post I did last year:
http://archives.postgresql.org/pgsql...6/msg00013.php

in that messages there are two script that deliver remotelly the
archive wall, and they store too last WAL in order to not loose the
current WAL in case of crash.

This was the function I used to find the WAL in use:

function copy_last_wal
{
FILE=$( ls -t1p $PGXLOG | grep -v / | head -1 )

echo "Last Wal> " $FILE

cp ${PGXLOG}/${FILE} ${PARTIAL}/${FILE}.tmp
mv ${PARTIAL}/${FILE}.tmp ${PARTIAL}/${FILE}.partial
find ${PARTIAL} -name *.partial | grep -v ${FILE} | xargs -i rm -fr {}
}


At that time Tom Lane agreed to provide some functions to ask the engine
the name of WAL currently in use, dunno if in the new 8.1 something was
done in order to help this process.


Regards
Gaetano Mendola

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #13 (permalink)  
Old 04-10-2008, 05:19 AM
Jeff Frost
 
Posts: n/a
Default Re: archive_command

On Tue, 4 Oct 2005, Simon Riggs wrote:

>> Now here's the problem...the servers
>> switched roles on Sep 21. We switched them back a little while after that.
>> When I went to replay the WAL files using the Sep 15 base backup, it happily
>> played back the WAL files to Sep 21 and stopped. I presume that this means
>> whenever the servers switch roles I need to create a fresh base backup?

>
> Yes, but more generally if I lost one node I would always be inclined to
> fully backup the remaining one just in case.


I believe they were switched not because one went down, but to add more ram to
the primary. They switched back just peachy at that time.

>> Is there any possible way to replay the other WAL files after the Sep 21
>> switchover/switchback? I'm going to guess this is similar to having another
>> timeline.

>
> That may be your exact case, it depends upon how you did recovery. Both
> systems think they are the same one, so you need to be careful.
>
> Look for the last log file of the went-down node. That file should exist
> twice, once as finally written by the went-down node and once as
> continued to be written to by the stayed-up node after switchover. It
> sounds like the wrong file was used to recover with and as a result
> stopped recovery on Sep 21. But if you're running with the system now,
> you'll need to do an intermediate rebuild and recover the data manually.


What would constitute an intermediate rebuild? Of course the system is up and
live and having data added to it. How would one restore from multiple
timelines?

> (I recommend testing recovery procedures before you go live and then at
> least every 6 months, to ensure that they actually still work when you
> need 'em. Trying to do a recovery with any confidence is not easy at 4am
> on a Sunday morning under maximum stress, from experience.)
>


I absolutely agree..unfortunately, it's only been 3 months since the last
test. :-(


--
Jeff Frost, Owner <jeff@frostconsultingllc.com>
Frost Consulting, LLC http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #14 (permalink)  
Old 04-10-2008, 05:20 AM
Simon Riggs
 
Posts: n/a
Default Re: archive_command

On Tue, 2005-10-04 at 08:03 -0700, Jeff Frost wrote:
> >> Is there any possible way to replay the other WAL files after the Sep 21
> >> switchover/switchback? I'm going to guess this is similar to having another
> >> timeline.

> >
> > That may be your exact case, it depends upon how you did recovery. Both
> > systems think they are the same one, so you need to be careful.
> >
> > Look for the last log file of the went-down node. That file should exist
> > twice, once as finally written by the went-down node and once as
> > continued to be written to by the stayed-up node after switchover. It
> > sounds like the wrong file was used to recover with and as a result
> > stopped recovery on Sep 21. But if you're running with the system now,
> > you'll need to do an intermediate rebuild and recover the data manually.

>
> What would constitute an intermediate rebuild? Of course the system is up and
> live and having data added to it. How would one restore from multiple
> timelines?


This is only if you are back up and working on the went-down box:

intermediate rebuild: I mean that you will have to restore the data from
the period of switchover, manually extract the relevant data with SQL
and then re-insert those changes yourself and resolve conflicts.

There isn't a process to merge the two log streams. The same txnid will
have been used on both servers to refer to separate transactions, so
there can be no automated way of resolving the data. It has to be done
using business domain knowledge rather than log data.

All of that's no different from any other RDBMS, as I'm sure you know.

Best Regards, Simon Riggs




---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #15 (permalink)  
Old 04-10-2008, 05:20 AM
Jeff Frost
 
Posts: n/a
Default Re: archive_command

On Tue, 4 Oct 2005, Simon Riggs wrote:

>> What would constitute an intermediate rebuild? Of course the system is up and
>> live and having data added to it. How would one restore from multiple
>> timelines?

>
> This is only if you are back up and working on the went-down box:
>
> intermediate rebuild: I mean that you will have to restore the data from
> the period of switchover, manually extract the relevant data with SQL
> and then re-insert those changes yourself and resolve conflicts.


I think we're likely out of luck here, but let me provide a few more details
so you might give me a better idea.

We are up on the went-down box with a pg_dump restore from the 3:30a.m. of the
previous day. We have WAL files available from the box that was up during the
failure of the primary. What I do not have is a base backup of the box which
was up for part of the day while we brought up the went-down box. So, unless
there is some method of replaying WAL files by hand one at a time, I think we
are out of luck, no?


>
> There isn't a process to merge the two log streams. The same txnid will
> have been used on both servers to refer to separate transactions, so
> there can be no automated way of resolving the data. It has to be done
> using business domain knowledge rather than log data.
>
> All of that's no different from any other RDBMS, as I'm sure you know.


I do know, and in fact, I think PITR is great, I just wish we had thought
about the corner case of the servers switching roles between the biweekly base
backups.

--
Jeff Frost, Owner <jeff@frostconsultingllc.com>
Frost Consulting, LLC http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #16 (permalink)  
Old 04-10-2008, 05:20 AM
Jeff Frost
 
Posts: n/a
Default Re: archive_command

On Tue, 4 Oct 2005, Gaetano Mendola wrote:

Looks like we're doing just about the same thing, but you're using head and
I'm using tail. However, it seems that your method does not require grepping
out the special files, so that's awesome!

BTW you do not have to use the -1 flag when you put ls to a pipe.

-Jeff

> FILE=$( ls -t1p $PGXLOG | grep -v / | head -1 )


--
Jeff Frost, Owner <jeff@frostconsultingllc.com>
Frost Consulting, LLC http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #17 (permalink)  
Old 04-10-2008, 05:22 AM
Jeff Frost
 
Posts: n/a
Default Re: archive_command

Looks like using that FILE= expression, I ended up with a bad file choice last
night immediately after the base backup:

Copying 0000000100000015000000F8.0088A490.backup to
/mnt/pgbackup/pitr/0000000100000015000000F8.0088A490.backup

Also, I noticed that rsync had a file disappear out from under it during the
base backup:

file has vanished: "/usr/local/pgsql/data/pg_subtrans/0743"

So, I have modified my FILE= expression to the following:

FILE=`ls -tp /pg_xlog/ | grep -v "backup\|/" | head -1`

Which gives me this file: 00000001000000160000002A

Given these in the pg_xlog dir.

00000001000000160000002A
00000001000000160000002B
00000001000000160000002C
00000001000000160000002D
00000001000000160000002E
00000001000000160000002F
000000010000001600000030
000000010000001600000031

Removing the -t takes yields the same result as long as I grep out the .backup
files.


On Tue, 4 Oct 2005, Gaetano Mendola wrote:

> This was the function I used to find the WAL in use:
>
> function copy_last_wal
> {
> FILE=$( ls -t1p $PGXLOG | grep -v / | head -1 )
>
> echo "Last Wal> " $FILE
>
> cp ${PGXLOG}/${FILE} ${PARTIAL}/${FILE}.tmp
> mv ${PARTIAL}/${FILE}.tmp ${PARTIAL}/${FILE}.partial
> find ${PARTIAL} -name *.partial | grep -v ${FILE} | xargs -i rm -fr {}
> }
>


--
Jeff Frost, Owner <jeff@frostconsultingllc.com>
Frost Consulting, LLC http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 09:22 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com