Unix Technical Forum

Re: [HACKERS] Full page writes improvement, code update

This is a discussion on Re: [HACKERS] Full page writes improvement, code update within the Pgsql Patches forums, part of the PostgreSQL category; --> On Tue, 2007-03-27 at 11:52 +0900, Koichi Suzuki wrote: > Here's an update of a code to improve full ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Patches

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 10:47 AM
Simon Riggs
 
Posts: n/a
Default Re: [HACKERS] Full page writes improvement, code update

On Tue, 2007-03-27 at 11:52 +0900, Koichi Suzuki wrote:

> Here's an update of a code to improve full page writes as proposed in
>
> http://archives.postgresql.org/pgsql...1/msg01491.php
> and
> http://archives.postgresql.org/pgsql...1/msg00607.php
>
> Update includes some modification for error handling in archiver and
> restoration command.
>
> In the previous threads, I posted several evaluation and shown that we
> can keep all the full page writes needed for full XLOG crash recovery,
> while compressing archive log size considerably better than gzip, with
> less CPU consumption. I've found no further objection for this proposal
> but still would like to hear comments/opinions/advices.


Koichi-san,

Looks interesting. I like the small amount of code to do this.

A few thoughts:

- Not sure why we need "full_page_compress", why not just mark them
always? That harms noone. (Did someone else ask for that? If so, keep
it)

- OTOH I'd like to see an explicit parameter set during recovery since
you're asking the main recovery path to act differently in case a single
bit is set/unset. If you are using that form of recovery, we should say
so explicitly, to keep everybody else safe.

- I'd rather mark just the nonremovable blocks. But no real difference

- We definitely don't want an normal elog in a XLogInsert critical
section, especially at DEBUG1 level

- diff -c format is easier and the standard

- pg_logarchive and pg_logrestore should be called by a name that
reflects what they actually do. Possibly pg_compresslog and
pg_decompresslog etc.. I've not reviewed those programs, but:

- Not sure why we have to compress away page headers. Touch as little as
you can has always been my thinking with recovery code.

- I'm very uncomfortable with touching the LSN. Maybe I misunderstand?

- Have you thought about how pg_standby would integrate with this
option? Can you please?

- I'll do some docs for this after Freeze, if you'd like. I have some
other changes to make there, so I can do this at the same time.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-18-2008, 10:47 AM
Koichi Suzuki
 
Posts: n/a
Default Re: [HACKERS] Full page writes improvement, code update

Simon;

Thanks a lot for your comments/advices. I'd like to write some feedback.

Simon Riggs wrote:
> On Tue, 2007-03-27 at 11:52 +0900, Koichi Suzuki wrote:
>
>> Here's an update of a code to improve full page writes as proposed in
>>
>> http://archives.postgresql.org/pgsql...1/msg01491.php
>> and
>> http://archives.postgresql.org/pgsql...1/msg00607.php
>>
>> Update includes some modification for error handling in archiver and
>> restoration command.
>>
>> In the previous threads, I posted several evaluation and shown that we
>> can keep all the full page writes needed for full XLOG crash recovery,
>> while compressing archive log size considerably better than gzip, with
>> less CPU consumption. I've found no further objection for this proposal
>> but still would like to hear comments/opinions/advices.

>
> Koichi-san,
>
> Looks interesting. I like the small amount of code to do this.
>
> A few thoughts:
>
> - Not sure why we need "full_page_compress", why not just mark them
> always? That harms noone. (Did someone else ask for that? If so, keep
> it)


No, no one asked to have a separate option. There'll be no bad
influence to do so. As written below, full page write can be
categolized as follows:

1) Needed for crash recovery: first page update after each checkpoint.
This has to be kept in WAL.

2) Needed for archive recovery: page update between pg_start_backup and
pg_stop_backup. This has to be kept in archive log.

3) For log-shipping slave such as pg_standby: no full page writes will
be needed for this purpose.

My proposal deals with 2). So, if we mark each "full_page_write", I'd
rather mark when this is needed. Still need only one bit because the
case 3) does not need any mark.

> - OTOH I'd like to see an explicit parameter set during recovery since
> you're asking the main recovery path to act differently in case a single
> bit is set/unset. If you are using that form of recovery, we should say
> so explicitly, to keep everybody else safe.


Only one thing I had to do is to create "dummy" full page write to
maintain LSNs. Full page writes are omitted in archive log. We have to
LSNs same as those in the original WAL. In this case, recovery has to
read logical log, not "dummy" full page writes. On the other hand, if
both logical log and "real" full page writes are found in a log record,
the recovery has to use "real" full page writes.

> - I'd rather mark just the nonremovable blocks. But no real difference


It sound nicer. According to the full page write categories above, we
can mark full page writes as needed in "crash recovery" or "archive
recovery". Please give some feedback to the above full page write
categories.

>
> - We definitely don't want an normal elog in a XLogInsert critical
> section, especially at DEBUG1 level


I agree. I'll remove elog calls from critical sections.

> - diff -c format is easier and the standard


I'll change the diff option.

> - pg_logarchive and pg_logrestore should be called by a name that
> reflects what they actually do. Possibly pg_compresslog and
> pg_decompresslog etc.. I've not reviewed those programs, but:


I wasn't careful to have command names more based on what to be done.
I'll change the command name.

>
> - Not sure why we have to compress away page headers. Touch as little as
> you can has always been my thinking with recovery code.


Eliminating page headers gives compression rate slightly better, a
couple of percents. To make code simpler, I'll drop this compression.

>
> - I'm very uncomfortable with touching the LSN. Maybe I misunderstand?


The reason why we need pg_logarchive (or pg_decompresslog) is to
maintain LSN the same as those in the original WAL. For this purpose,
we need "dummy" full page write. I don't like to touch LSN either and
this is the reason why pg_decompresslog need some extra work,
eliminating many other changes in the recovery.

>
> - Have you thought about how pg_standby would integrate with this
> option? Can you please?


Yes I believe so. As pg_standby does not include any chance to meet
partial writes of pages, I believe you can omit all the full page
writes. Of course, as Tom Lange suggested in
http://archives.postgresql.org/pgsql...2/msg00034.php
removing full page writes can lose a chance to recover from
partial/inconsisitent writes in the crash of pg_standby. In this case,
we have to import a backup and archive logs (with full page writes
during the backup) to recover. (We have to import them when the file
system crashes anyway). If it's okay, I believe
pg_compresslog/pg_decompresslog can be integrated with pg_standby.

Maybe we can work together to include pg_compresslog/pg_decompresslog in
pg_standby.

>
> - I'll do some docs for this after Freeze, if you'd like. I have some
> other changes to make there, so I can do this at the same time.


I'll be looking forward to your writings.

Best regards;

--
Koichi Suzuki

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-18-2008, 10:47 AM
Simon Riggs
 
Posts: n/a
Default Re: [HACKERS] Full page writes improvement, code update

On Wed, 2007-03-28 at 10:54 +0900, Koichi Suzuki wrote:

> As written below, full page write can be
> categolized as follows:
>
> 1) Needed for crash recovery: first page update after each checkpoint.
> This has to be kept in WAL.
>
> 2) Needed for archive recovery: page update between pg_start_backup and
> pg_stop_backup. This has to be kept in archive log.
>
> 3) For log-shipping slave such as pg_standby: no full page writes will
> be needed for this purpose.
>
> My proposal deals with 2). So, if we mark each "full_page_write", I'd
> rather mark when this is needed. Still need only one bit because the
> case 3) does not need any mark.


I'm very happy with this proposal, though I do still have some points in
detailed areas.

If you accept that 1 & 2 are valid goals, then 1 & 3 or 1, 2 & 3 are
also valid goals, ISTM. i.e. you might choose to use full_page_writes on
the primary and yet would like to see optimised data transfer to the
standby server. In that case, you would need the mark.

> > - Not sure why we need "full_page_compress", why not just mark them
> > always? That harms noone. (Did someone else ask for that? If so, keep
> > it)

>
> No, no one asked to have a separate option. There'll be no bad
> influence to do so. So, if we mark each "full_page_write", I'd
> rather mark when this is needed. Still need only one bit because the
> case 3) does not need any mark.


OK, different question:
Why would anyone ever set full_page_compress = off?

Why have a parameter that does so little? ISTM this is:

i) one more thing to get wrong

ii) cheaper to mark the block when appropriate than to perform the if()
test each time. That can be done only in the path where backup blocks
are present.

iii) If we mark the blocks every time, it allows us to do an offline WAL
compression. If the blocks aren't marked that option is lost. The bit is
useful information, so we should have it in all cases.

> > - OTOH I'd like to see an explicit parameter set during recovery since
> > you're asking the main recovery path to act differently in case a single
> > bit is set/unset. If you are using that form of recovery, we should say
> > so explicitly, to keep everybody else safe.

>
> Only one thing I had to do is to create "dummy" full page write to
> maintain LSNs. Full page writes are omitted in archive log. We have to
> LSNs same as those in the original WAL. In this case, recovery has to
> read logical log, not "dummy" full page writes. On the other hand, if
> both logical log and "real" full page writes are found in a log record,
> the recovery has to use "real" full page writes.


I apologise for not understanding your reply, perhaps my original
request was unclear.

In recovery.conf, I'd like to see a parameter such as

dummy_backup_blocks = off (default) | on

to explicitly indicate to the recovery process that backup blocks are
present, yet they are garbage and should be ignored. Having garbage data
within the system is potentially dangerous and I want to be told by the
user that they were expecting that and its OK to ignore that data.
Otherwise I want to throw informative errors. Maybe it seems OK now, but
the next change to the system may have unintended consequences and it
may not be us making the change. "It's OK the Alien will never escape
from the lab" is the starting premise for many good sci-fi horrors and I
want to watch them, not be in one myself. :-)

We can call it other things, of course. e.g.
ignore_dummy_blocks
decompressed_blocks
apply_backup_blocks

> Yes I believe so. As pg_standby does not include any chance to meet
> partial writes of pages, I believe you can omit all the full page
> writes. Of course, as Tom Lange suggested in
> http://archives.postgresql.org/pgsql...2/msg00034.php
> removing full page writes can lose a chance to recover from
> partial/inconsisitent writes in the crash of pg_standby. In this case,
> we have to import a backup and archive logs (with full page writes
> during the backup) to recover. (We have to import them when the file
> system crashes anyway). If it's okay, I believe
> pg_compresslog/pg_decompresslog can be integrated with pg_standby.
>
> Maybe we can work together to include pg_compresslog/pg_decompresslog in
> pg_standby.


ISTM there are two options.

I think this option is already possible:

1. Allow pg_decompresslog to operate on a file, replacing it with the
expanded form, like gunzip, so we would do this:
restore_command = 'pg_standby %f decomp.tmp && pg_decompresslog
decomp.tmp %p'

though the decomp.tmp file would not get properly initialised or cleaned
up when we finish.

whereas this will take additional work

2. Allow pg_standby to write to stdin, so that we can do this:
restore_command = 'pg_standby %f | pg_decompresslog - %p'

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com



---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-18-2008, 10:47 AM
Koichi Suzuki
 
Posts: n/a
Default Re: [HACKERS] Full page writes improvement, code update

Hi, Here're some feedback to the comment:

Simon Riggs wrote:
> On Wed, 2007-03-28 at 10:54 +0900, Koichi Suzuki wrote:
>
>> As written below, full page write can be
>> categolized as follows:
>>
>> 1) Needed for crash recovery: first page update after each checkpoint.
>> This has to be kept in WAL.
>>
>> 2) Needed for archive recovery: page update between pg_start_backup and
>> pg_stop_backup. This has to be kept in archive log.
>>
>> 3) For log-shipping slave such as pg_standby: no full page writes will
>> be needed for this purpose.
>>
>> My proposal deals with 2). So, if we mark each "full_page_write", I'd
>> rather mark when this is needed. Still need only one bit because the
>> case 3) does not need any mark.

>
> I'm very happy with this proposal, though I do still have some points in
> detailed areas.
>
> If you accept that 1 & 2 are valid goals, then 1 & 3 or 1, 2 & 3 are
> also valid goals, ISTM. i.e. you might choose to use full_page_writes on
> the primary and yet would like to see optimised data transfer to the
> standby server. In that case, you would need the mark.


Yes, I need the mark. In my proposal, only unmarked full-page-writes,
which were written as the first update after a checkpoint, are to be
removed offline (pg_compresslog).

>
>>> - Not sure why we need "full_page_compress", why not just mark them
>>> always? That harms noone. (Did someone else ask for that? If so, keep
>>> it)

>> No, no one asked to have a separate option. There'll be no bad
>> influence to do so. So, if we mark each "full_page_write", I'd
>> rather mark when this is needed. Still need only one bit because the
>> case 3) does not need any mark.

>
> OK, different question:
> Why would anyone ever set full_page_compress = off?
>
> Why have a parameter that does so little? ISTM this is:
>
> i) one more thing to get wrong
>
> ii) cheaper to mark the block when appropriate than to perform the if()
> test each time. That can be done only in the path where backup blocks
> are present.
>
> iii) If we mark the blocks every time, it allows us to do an offline WAL
> compression. If the blocks aren't marked that option is lost. The bit is
> useful information, so we should have it in all cases.


Not only full-page-writes are written as WAL record. In my proposal,
both full-page-writes and logical log are written in a WAL record, which
will make WAL size slightly bigger (five percent or so). If
full_page_compress = off, only a full-page-write will be written in a
WAL record. I thought someone will not be happy with this size growth.

I agree to make this mandatory if every body is happy with extra logical
log in WAL records with full page writes.

I'd like to have your opinion.

>
>>> - OTOH I'd like to see an explicit parameter set during recovery since
>>> you're asking the main recovery path to act differently in case a single
>>> bit is set/unset. If you are using that form of recovery, we should say
>>> so explicitly, to keep everybody else safe.

>> Only one thing I had to do is to create "dummy" full page write to
>> maintain LSNs. Full page writes are omitted in archive log. We have to
>> LSNs same as those in the original WAL. In this case, recovery has to
>> read logical log, not "dummy" full page writes. On the other hand, if
>> both logical log and "real" full page writes are found in a log record,
>> the recovery has to use "real" full page writes.

>
> I apologise for not understanding your reply, perhaps my original
> request was unclear.
>
> In recovery.conf, I'd like to see a parameter such as
>
> dummy_backup_blocks = off (default) | on
>
> to explicitly indicate to the recovery process that backup blocks are
> present, yet they are garbage and should be ignored. Having garbage data
> within the system is potentially dangerous and I want to be told by the
> user that they were expecting that and its OK to ignore that data.
> Otherwise I want to throw informative errors. Maybe it seems OK now, but
> the next change to the system may have unintended consequences and it
> may not be us making the change. "It's OK the Alien will never escape
> from the lab" is the starting premise for many good sci-fi horrors and I
> want to watch them, not be in one myself. :-)
>
> We can call it other things, of course. e.g.
> ignore_dummy_blocks
> decompressed_blocks
> apply_backup_blocks


So far, we don't need any modification to the recovery and redo
functions. They ignore the dummy and apply logical logs. Also, if
there are both full page writes and logical log, current recovery
selects full page writes to apply.

I agree to introduce this option if 8.3 code introduces any conflict to
the current. Or, we could introduce this option for future safety. Do
you think we should introduce this option?

If this should be introduced now, what we should do is to check this
option when dummy full-page-write appears.


>
>> Yes I believe so. As pg_standby does not include any chance to meet
>> partial writes of pages, I believe you can omit all the full page
>> writes. Of course, as Tom Lange suggested in
>> http://archives.postgresql.org/pgsql...2/msg00034.php
>> removing full page writes can lose a chance to recover from
>> partial/inconsisitent writes in the crash of pg_standby. In this case,
>> we have to import a backup and archive logs (with full page writes
>> during the backup) to recover. (We have to import them when the file
>> system crashes anyway). If it's okay, I believe
>> pg_compresslog/pg_decompresslog can be integrated with pg_standby.
>>
>> Maybe we can work together to include pg_compresslog/pg_decompresslog in
>> pg_standby.

>
> ISTM there are two options.
>
> I think this option is already possible:
>
> 1. Allow pg_decompresslog to operate on a file, replacing it with the
> expanded form, like gunzip, so we would do this:
> restore_command = 'pg_standby %f decomp.tmp && pg_decompresslog
> decomp.tmp %p'
>
> though the decomp.tmp file would not get properly initialised or cleaned
> up when we finish.
>
> whereas this will take additional work
>
> 2. Allow pg_standby to write to stdin, so that we can do this:
> restore_command = 'pg_standby %f | pg_decompresslog - %p'
>


Both seem to work fine. pg_decompresslog will read entire file at the
beginning and so both two will be equivallent. To make the second
option run quicker, pg_decompresslog needs some modification to read WAL
record one after another.

Anyway, could you try to run pg_standby with pg_compresslog and
pg_decompresslog?

----
Additional recomment on page header removal:

I found that it is not simple to keep page header in the compressed
archive log. Because we eliminate unmarked full page writes and shift
the rest of the WAL file data, it is not simple to keep page header as
the page header in the compressed archive log. It is much simpler to
remove page header as well and rebuild them. I'd like to keep current
implementation in this point.

Any suggestions are welcome.
-------------------

I'll modify the name of the commands and post it hopefully within 20hours.

With Best Regards;

--
Koichi Suzuki

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-18-2008, 10:48 AM
Simon Riggs
 
Posts: n/a
Default Re: [HACKERS] Full page writes improvement, code update

On Thu, 2007-03-29 at 17:50 +0900, Koichi Suzuki wrote:

> Not only full-page-writes are written as WAL record. In my proposal,
> both full-page-writes and logical log are written in a WAL record, which
> will make WAL size slightly bigger (five percent or so). If
> full_page_compress = off, only a full-page-write will be written in a
> WAL record. I thought someone will not be happy with this size growth.


OK, I see what you're doing now and agree with you that we do need a
parameter. Not sure about the name you've chosen though - it certainly
confused me until you explained.

A parameter called ..._compress indicates to me that it would reduce
something in size whereas what it actually does is increase the size of
WAL slightly. We should have a parameter name that indicates what it
actually does, otherwise some people will choose to use this parameter
even when they are not using archive_command with pg_compresslog.

Some possible names...

additional_wal_info = 'COMPRESS'
add_wal_info
wal_additional_info
wal_auxiliary_info
wal_extra_data
attach_wal_info
....
others?

I've got some ideas for the future for adding additional WAL info for
various purposes, so it might be useful to have a parameter that can
cater for multiple types of additional WAL data. Or maybe we go for
something more specific like

wal_add_compress_info = on
wal_add_XXXX_info ...

> > In recovery.conf, I'd like to see a parameter such as
> >
> > dummy_backup_blocks = off (default) | on
> >
> > to explicitly indicate to the recovery process that backup blocks are
> > present, yet they are garbage and should be ignored. Having garbage data
> > within the system is potentially dangerous and I want to be told by the
> > user that they were expecting that and its OK to ignore that data.
> > Otherwise I want to throw informative errors. Maybe it seems OK now, but
> > the next change to the system may have unintended consequences and it
> > may not be us making the change. "It's OK the Alien will never escape
> > from the lab" is the starting premise for many good sci-fi horrors and I
> > want to watch them, not be in one myself. :-)
> >
> > We can call it other things, of course. e.g.
> > ignore_dummy_blocks
> > decompressed_blocks
> > apply_backup_blocks

>
> So far, we don't need any modification to the recovery and redo
> functions. They ignore the dummy and apply logical logs. Also, if
> there are both full page writes and logical log, current recovery
> selects full page writes to apply.
>
> I agree to introduce this option if 8.3 code introduces any conflict to
> the current. Or, we could introduce this option for future safety. Do
> you think we should introduce this option?


Yes. You are skipping a correctness test and that should be by explicit
command only. It's no problem to include that as well, since you are
already having to specify pg_... decompress... but the recovery process
doesn't know whether or not you've done that.

> Anyway, could you try to run pg_standby with pg_compresslog and
> pg_decompresslog?


After freeze, yes.

> ----
> Additional recomment on page header removal:
>
> I found that it is not simple to keep page header in the compressed
> archive log. Because we eliminate unmarked full page writes and shift
> the rest of the WAL file data, it is not simple to keep page header as
> the page header in the compressed archive log. It is much simpler to
> remove page header as well and rebuild them. I'd like to keep current
> implementation in this point.


OK.

This is a good feature. Thanks for your patience with my comments.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-18-2008, 10:48 AM
Josh Berkus
 
Posts: n/a
Default Re: [HACKERS] Full page writes improvement, code update

Simon,

> OK, different question:
> Why would anyone ever set full_page_compress = off?


The only reason I can see is if compression costs us CPU but gains RAM &
I/O. I can think of a lot of applications ... benchmarks included ...
which are CPU-bound but not RAM or I/O bound. For those applications,
compression is a bad tradeoff.

If, however, CPU used for compression is made up elsewhere through smaller
file processing, then I'd agree that we don't need a switch.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-18-2008, 10:48 AM
Simon Riggs
 
Posts: n/a
Default Re: [HACKERS] Full page writes improvement, code update

On Thu, 2007-03-29 at 11:45 -0700, Josh Berkus wrote:

> > OK, different question:
> > Why would anyone ever set full_page_compress = off?

>
> The only reason I can see is if compression costs us CPU but gains RAM &
> I/O. I can think of a lot of applications ... benchmarks included ...
> which are CPU-bound but not RAM or I/O bound. For those applications,
> compression is a bad tradeoff.
>
> If, however, CPU used for compression is made up elsewhere through smaller
> file processing, then I'd agree that we don't need a switch.


Koichi-san has explained things for me now.

I misunderstood what the parameter did and reading your post, ISTM you
have as well. I do hope Koichi-san will alter the name to allow
everybody to understand what it does.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-18-2008, 10:48 AM
Zeugswetter Andreas ADI SD
 
Posts: n/a
Default Re: [HACKERS] Full page writes improvement, code update


> Without a switch, because both full page writes and
> corresponding logical log is included in WAL, this will
> increase WAL size slightly
> (maybe about five percent or so). If everybody is happy
> with this, we
> don't need a switch.


Sorry, I still don't understand that. What is the "corresponding logical
log" ?
It seems to me, that a full page WAL record has enough info to produce a

dummy LSN WAL entry. So insead of just cutting the full page wal record
you
could replace it with a LSN WAL entry when archiving the log.

Then all that is needed is the one flag, no extra space ?

Andreas


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-18-2008, 10:48 AM
Simon Riggs
 
Posts: n/a
Default Re: [HACKERS] Full page writes improvement, code update

On Fri, 2007-03-30 at 10:22 +0200, Zeugswetter Andreas ADI SD wrote:
> > Without a switch, because both full page writes and
> > corresponding logical log is included in WAL, this will
> > increase WAL size slightly
> > (maybe about five percent or so). If everybody is happy
> > with this, we
> > don't need a switch.

>
> Sorry, I still don't understand that. What is the "corresponding logical
> log" ?
> It seems to me, that a full page WAL record has enough info to produce a
>
> dummy LSN WAL entry. So insead of just cutting the full page wal record
> you
> could replace it with a LSN WAL entry when archiving the log.
>
> Then all that is needed is the one flag, no extra space ?


The full page write is required for crash recovery, but that isn't
required during archive recovery because the base backup provides the
safe base. Archive recovery needs the normal xlog record, which in some
cases has been optimised away because the backup block is present, since
the full block already contains the changes.

If you want to remove the backup blocks, you need to put back the
information that was optimised away, otherwise you won't be able to do
the archive recovery correctly. Hence a slight increase in WAL volume to
allow it to be compressed does make sense.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-18-2008, 10:48 AM
Richard Huxton
 
Posts: n/a
Default Re: [HACKERS] Full page writes improvement, code update

Simon Riggs wrote:
> On Fri, 2007-03-30 at 10:22 +0200, Zeugswetter Andreas ADI SD wrote:
>>> Without a switch, because both full page writes and
>>> corresponding logical log is included in WAL, this will
>>> increase WAL size slightly
>>> (maybe about five percent or so). If everybody is happy
>>> with this, we
>>> don't need a switch.

>> Sorry, I still don't understand that. What is the "corresponding logical
>> log" ?
>> It seems to me, that a full page WAL record has enough info to produce a
>>
>> dummy LSN WAL entry. So insead of just cutting the full page wal record
>> you
>> could replace it with a LSN WAL entry when archiving the log.
>>
>> Then all that is needed is the one flag, no extra space ?

>
> The full page write is required for crash recovery, but that isn't
> required during archive recovery because the base backup provides the
> safe base.


Is that always true? Could the backup not pick up a partially-written
page? Assuming it's being written to as the backup is in progress. (We
are talking about when disk blocks are smaller than PG blocks here, so
can't guarantee an atomic write for a PG block?)

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:35 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com