Unix Technical Forum

SEO

vBulletin Search Engine Optimization


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql General

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-09-2008, 06:58 PM
Joey K.
 
Posts: n/a
Default Seeking datacenter PITR backup suggestions

Greetings,

We have several web applications with Pg 8.2.x running on isolated servers
(~25). The database size on each machines (du -h pgdata) is ~2 GB. We have
been using nightly filesystem backup (stop pg, tar backup to ftp, start pg)
and it worked well.

We would like to move to PITR backups since the database size will increase
moving forward and our current backup method might increase server
downtimes.

We have a central ftp backup server (yes, ftp :-) which we would like to use
for weekly full and daily incremental PITR backups.

After reading the docs, PITR is still fuzzy. Our ideas for backup are (do
not worry about the syntax),

** START **

tmpwal = "/localhost/tmp" # tmp space on server 1 for storing wal files
before ftp
Configure $pgdata/postgresql.conf archive_command = "cp %p $tmpwal/%f"

Day 1:
% psql pg_start_backup(); tar pgdata.tar --exclude pg_xlog/ pgdata
% psql pg_stop_backup()
% ftp put pgdata.tar ftpserver:/server1/day1/pgdata
% ftp put $tmpwal/* ftpserver:/server1/day1/wal
% rm -f $tmpwal/* pgdata.tar

Day 2:
% ftp put $tmpwal/* ftpserver:/server1/day2/wal
% rm -f $tmpwal/*

Day 3:
....
....

Day 7:
% rm -f $tmpwal/*
Start over

Recovery on server1 (skeleton commands),
% rm -f $tmpwal/*
% mv pgdata pgdata.hosed
% ftp get ftpbackup:/server1/day1/pgdata.tar .
% tar -xvf pgdata.tar
% ftp get ftpbackup:/server1/day1/wal/* $tmpwal
% ftp get ftpbackup:/server1/day2/wal/* $tmpwal
......
......
% cp -r pgdata.hosed/pg_xlog pgdata/
% echo "cp $tmpwal/%f %p" > pgdata/recovery.conf
% start pg (recovery begins)

** END **

Assumptions:
a. After pg_stop_backup(), Pg immediately recycles log files and hence wal
logs can be copied to backup. This is a clean start.
b. New wal files since (a) are incremental backups

We are not sure if WAL log filenames are unique and possibly overwrite older
wal files during recovery.

I'm seeking suggestions from others with experience performing PostgreSQL
PITR backups from multiple servers to a central backup server.

Thanks in advance,
Joey Krane

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-09-2008, 07:06 PM
Decibel!
 
Posts: n/a
Default Re: Seeking datacenter PITR backup suggestions

On Aug 17, 2007, at 5:48 PM, Joey K. wrote:
> We have several web applications with Pg 8.2.x running on isolated
> servers (~25). The database size on each machines (du -h pgdata) is
> ~2 GB. We have been using nightly filesystem backup (stop pg, tar
> backup to ftp, start pg) and it worked well.
>
> We would like to move to PITR backups since the database size will
> increase moving forward and our current backup method might
> increase server downtimes.
>
> We have a central ftp backup server (yes, ftp :-) which we would
> like to use for weekly full and daily incremental PITR backups.
>
> After reading the docs, PITR is still fuzzy. Our ideas for backup
> are (do not worry about the syntax),
>
> ** START **
>
> tmpwal = "/localhost/tmp" # tmp space on server 1 for storing wal
> files before ftp
> Configure $pgdata/postgresql.conf archive_command = "cp %p $tmpwal/%f"


Why not just FTP WAL files directly?

> Day 1:
> % psql pg_start_backup(); tar pgdata.tar --exclude pg_xlog/ pgdata
> % psql pg_stop_backup()
> % ftp put pgdata.tar ftpserver:/server1/day1/pgdata
> % ftp put $tmpwal/* ftpserver:/server1/day1/wal
> % rm -f $tmpwal/* pgdata.tar


The last 2 are a race condition... you could easily lose a WAL file
that way.

Keep in mind that that pgdata.tar is 100% useless unless you also
have the WAL files that were created during the backup. I generally
recommend to folks that they keep two base copies around for that
reason.

> Day 2:
> % ftp put $tmpwal/* ftpserver:/server1/day2/wal
> % rm -f $tmpwal/*
>
> Day 3:
> ...
> ...
>
> Day 7:
> % rm -f $tmpwal/*
> Start over
>
> Recovery on server1 (skeleton commands),
> % rm -f $tmpwal/*
> % mv pgdata pgdata.hosed
> % ftp get ftpbackup:/server1/day1/pgdata.tar .
> % tar -xvf pgdata.tar
> % ftp get ftpbackup:/server1/day1/wal/* $tmpwal
> % ftp get ftpbackup:/server1/day2/wal/* $tmpwal
> .....
> .....
> % cp -r pgdata.hosed/pg_xlog pgdata/
> % echo "cp $tmpwal/%f %p" > pgdata/recovery.conf
> % start pg (recovery begins)
>
> ** END **
>
> Assumptions:
> a. After pg_stop_backup(), Pg immediately recycles log files and
> hence wal logs can be copied to backup. This is a clean start.
> b. New wal files since (a) are incremental backups
>
> We are not sure if WAL log filenames are unique and possibly
> overwrite older wal files during recovery.
>
> I'm seeking suggestions from others with experience performing
> PostgreSQL PITR backups from multiple servers to a central backup
> server.


In general, your handling of WAL files seems fragile and error-prone.
I think it would make far more sense to just FTP them directly, and
not try and get fancy with different directories for different days.
*when* a WAL file was generated is meaningless until you compare it
to a base backup to see if that WAL file is required for the base
backup, useful (but not required) to the base backup, or useless for
the base backup.
--
Decibel!, aka Jim Nasby decibel@decibel.org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-09-2008, 07:13 PM
Steve Crawford
 
Posts: n/a
Default Re: Seeking datacenter PITR backup suggestions

> In general, your handling of WAL files seems fragile and error-prone....

Indeed. I would recommend simply using rsync to handle pushing the
files. I see several advantages:

1. Distributed load - you aren't copying a full-day of files all at once.

2. Very easy to set-up - you can use it directly as your archive_command
if you wish.

3. Atomic. Rsync copies new data to a temporary location that will only
be moved into place when the transfer is complete. The destination
server will never see a partial file. Depending on the FTP client/server
combo, you will likely end up with a partial file in the event of
communication failure.

4. Much more up-to-the-minute recovery data.

In your scenario, what about using "cp -l" (or "ln") instead? Since the
hard-link it is only creating a new pointer, it will be very fast and
save a bunch of disk IO on your server and it doesn't appear that the
tempdir is for much other than organizing purposes anyway.

I'm setting up some test machines to learn more about PITR and warm
backups and am considering a two-stage process using "cp -l" to add the
file to the list needing transfer and regular rsync to actually move the
files to the destination machine. (The destination machine will be over
a WAN link so I'd like to avoid having PG tied up waiting for each rsync
to complete.)

Cheers,
Steve


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-09-2008, 07:13 PM
Gregory Stark
 
Posts: n/a
Default Re: Seeking datacenter PITR backup suggestions

"Steve Crawford" <scrawford@pinpointresearch.com> writes:

> 4. Much more up-to-the-minute recovery data.
>
> In your scenario, what about using "cp -l" (or "ln") instead? Since the
> hard-link it is only creating a new pointer, it will be very fast and
> save a bunch of disk IO on your server and it doesn't appear that the
> tempdir is for much other than organizing purposes anyway.


Postgres tries to reuse WAL files. Once the archive_command completes it
believes it is safe to reuse the old file without deleting it. That will do
nasty things if you've used ln as your archive command.


> I'm setting up some test machines to learn more about PITR and warm
> backups and am considering a two-stage process using "cp -l" to add the
> file to the list needing transfer and regular rsync to actually move the
> files to the destination machine. (The destination machine will be over
> a WAN link so I'd like to avoid having PG tied up waiting for each rsync
> to complete.)


In theory the only thing that the archive command blocks is the reuse of WAL
log files. So as long as the command is running if Postgres has used up all
its existing WAL files it will have to create new ones which does have some
performance cost. But otherwise it's unaffected.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-09-2008, 07:13 PM
Steve Crawford
 
Posts: n/a
Default Re: Seeking datacenter PITR backup suggestions

Gregory Stark wrote:

>> In your scenario, what about using "cp -l" (or "ln") instead?....

>
> Postgres tries to reuse WAL files. Once the archive_command completes it
> believes it is safe to reuse the old file without deleting it. That will do
> nasty things if you've used ln as your archive command.


Um, OK. I won't try that. Thanks. That really could be nasty as I
imagine that it would cause either corruption or failures on the standby
server that might be very hard to track down.

> In theory the only thing that the archive command blocks is the reuse of WAL
> log files....


What happens if PG server crashes during a WAL transfer? Does it still
know the file needs to be transferred or is doing a fast copy to a local
(same machine or machine on local network) from which files are
transferred over a slow-link a safer process?

Also, in the (rare) case that PG needs to be restarted, will the restart
block while waiting for all current log transfers to complete?

Cheers,
Steve


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-09-2008, 07:14 PM
Decibel!
 
Posts: n/a
Default Re: Seeking datacenter PITR backup suggestions

On Tue, Aug 28, 2007 at 11:05:52AM -0700, Steve Crawford wrote:
> In your scenario, what about using "cp -l" (or "ln") instead? Since the
> hard-link it is only creating a new pointer, it will be very fast and
> save a bunch of disk IO on your server and it doesn't appear that the
> tempdir is for much other than organizing purposes anyway.


Note that that will only work if you're creating the link on the same
filesystem, and having /tmp and your data in the same filesystem isn't
such a hot idea.
--
Decibel!, aka Jim Nasby decibel@decibel.org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.3 (FreeBSD)

iD8DBQFG1LfldO30qud8SkgRAiS5AKCtayO6iocMMHiMgciICy xbkbl6jwCgqui2
aWCS8un+3gGcF5XW47PNkBo=
=JsYd
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-09-2008, 07:14 PM
Decibel!
 
Posts: n/a
Default Re: Seeking datacenter PITR backup suggestions

On Tue, Aug 28, 2007 at 08:31:10PM +0100, Gregory Stark wrote:
> "Steve Crawford" <scrawford@pinpointresearch.com> writes:
>
> > 4. Much more up-to-the-minute recovery data.
> >
> > In your scenario, what about using "cp -l" (or "ln") instead? Since the
> > hard-link it is only creating a new pointer, it will be very fast and
> > save a bunch of disk IO on your server and it doesn't appear that the
> > tempdir is for much other than organizing purposes anyway.

>
> Postgres tries to reuse WAL files. Once the archive_command completes it
> believes it is safe to reuse the old file without deleting it. That will do
> nasty things if you've used ln as your archive command.


I thought that was specifically disabled when PITR was enabled? Or do we
just do a rename rather than an unlink ond creating a new file?
--
Decibel!, aka Jim Nasby decibel@decibel.org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.3 (FreeBSD)

iD8DBQFG1LhJdO30qud8SkgRAkj7AJ95zbBpNameKIg9vNn7ag qD5CafqACgsvfv
QFsri07GsIvZv6AT+WpxzDg=
=ihUK
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-09-2008, 07:14 PM
Tom Lane
 
Posts: n/a
Default Re: Seeking datacenter PITR backup suggestions

Decibel! <decibel@decibel.org> writes:
> On Tue, Aug 28, 2007 at 08:31:10PM +0100, Gregory Stark wrote:
>> Postgres tries to reuse WAL files. Once the archive_command completes it
>> believes it is safe to reuse the old file without deleting it. That will do
>> nasty things if you've used ln as your archive command.


> I thought that was specifically disabled when PITR was enabled? Or do we
> just do a rename rather than an unlink ond creating a new file?


No. The only difference is we don't recycle the file until the
archive_command says it's done with it.

The archive_command must actually physically copy the data someplace
else, and must not return success until it's sure the copy is good.
Perhaps the docs are not sufficiently clear on the point?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-09-2008, 07:14 PM
Decibel!
 
Posts: n/a
Default Re: Seeking datacenter PITR backup suggestions

On Tue, Aug 28, 2007 at 09:54:23PM -0400, Tom Lane wrote:
> Decibel! <decibel@decibel.org> writes:
> > On Tue, Aug 28, 2007 at 08:31:10PM +0100, Gregory Stark wrote:
> >> Postgres tries to reuse WAL files. Once the archive_command completes it
> >> believes it is safe to reuse the old file without deleting it. That will do
> >> nasty things if you've used ln as your archive command.

>
> > I thought that was specifically disabled when PITR was enabled? Or do we
> > just do a rename rather than an unlink ond creating a new file?

>
> No. The only difference is we don't recycle the file until the
> archive_command says it's done with it.
>
> The archive_command must actually physically copy the data someplace
> else, and must not return success until it's sure the copy is good.
> Perhaps the docs are not sufficiently clear on the point?


Yeah... I think that's a big gotcha waiting to smack someone. I'd
actually make the mention <strong> so that hopefully no one can miss
it... or do we have an official method for putting warnings in the docs?

"Because WAL segment files are renamed and not re-created from scratch,
it is critical that the archive command actually copy files, not move
or hard-link them."
--
Decibel!, aka Jim Nasby decibel@decibel.org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.3 (FreeBSD)

iD8DBQFG1NQpdO30qud8SkgRAvZUAKDCDgrrEl4G0yD6++2DDF jAIeKzSACfWt4a
RUUrXz97KG0MqwUVfXHAvYw=
=2ko1
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-09-2008, 07:14 PM
Simon Riggs
 
Posts: n/a
Default Re: Seeking datacenter PITR backup suggestions

On Tue, 2007-08-28 at 21:04 -0500, Decibel! wrote:
> On Tue, Aug 28, 2007 at 09:54:23PM -0400, Tom Lane wrote:


> > Perhaps the docs are not sufficiently clear on the point?

>
> Yeah... I think that's a big gotcha waiting to smack someone. I'd
> actually make the mention <strong> so that hopefully no one can miss
> it... or do we have an official method for putting warnings in the docs?
>
> "Because WAL segment files are renamed and not re-created from scratch,
> it is critical that the archive command actually copy files, not move
> or hard-link them."


I'll shortly be writing a doc patch to clarify a few points and to
explain new possibilities, such as Koichi Suzuki's work.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 09:17 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
UnixAdminTalk.com

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551