Unix Technical Forum

Quick-and-dirty compression for WAL backup blocks

This is a discussion on Quick-and-dirty compression for WAL backup blocks within the pgsql Hackers forums, part of the PostgreSQL category; --> It seems we are more or less agreed that 32-bit CRC ought to be enough for WAL; and we ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-11-2008, 04:08 AM
Tom Lane
 
Posts: n/a
Default Quick-and-dirty compression for WAL backup blocks

It seems we are more or less agreed that 32-bit CRC ought to be enough
for WAL; and we also need to make a change to ensure that backup blocks
are positively linked to their parent WAL record, as I noted earlier
today. So as long as we have to mess with the WAL record format, I was
wondering what else we could get done in the same change.

The TODO item that comes to mind immediately is "Compress WAL entries".
The TODO.detail file for that has a whole lot of ideas of various
(mostly high) levels of complexity, but one thing we could do fairly
trivially is to try to compress the page images that are dumped into WAL
to protect against partial-write problems. After reviewing the old
discussion I still like the proposal I made:

> ... make the WAL writing logic aware of the layout
> of buffer pages --- specifically, to know that our pages generally
> contain an uninteresting "hole" in the middle, and not write the hole.
> Optimistically this might reduce the WAL data volume by something
> approaching 50%; though pessimistically (if most pages are near full)
> it wouldn't help much.


A more concrete version of this is: examine the page to see if the
pd_lower field is between SizeOfPageHeaderData and BLCKSZ, and if so
whether there is a run of consecutive zero bytes beginning at the
pd_lower position. Omit any such bytes from what is written to WAL.
(This definition ensures that nothing goes wrong if the page does not
follow the normal page layout conventions: the transformation is
lossless no matter what, since we can always reconstruct the exact page
contents.) The overhead needed is only 2 bytes to show the number of
bytes removed.

The other alternatives that were suggested included running the page
contents through the same compressor used for TOAST, and implementing
a general-purpose run-length compressor that could get rid of runs of
zeroes anywhere on the page. However, considering that the compression
work has to be done while holding WALInsertLock, it seems to me there
is a strong premium on speed. I think that lets out the TOAST
compressor, which isn't amazingly speedy. (Another objection to the
TOAST compressor is that it certainly won't win on already-compressed
toasted data.) A run-length compressor would be reasonably quick but
I think that the omit-the-middle-hole approach gets most of the possible
win with even less work. In particular, I think it can be proven that
omit-the-hole will actually require less CPU than now, since counting
zero bytes should be strictly faster than CRC'ing bytes, and we'll be
able to save the CRC work on whatever bytes we omit.

Any objections?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-11-2008, 04:08 AM
Simon Riggs
 
Posts: n/a
Default Re: Quick-and-dirty compression for WAL backup blocks

On Tue, 2005-05-31 at 16:26 -0400, Tom Lane wrote:
> The TODO item that comes to mind immediately is "Compress WAL entries".
> A more concrete version of this is: examine the page to see if the
> pd_lower field is between SizeOfPageHeaderData and BLCKSZ, and if so
> whether there is a run of consecutive zero bytes beginning at the
> pd_lower position. Omit any such bytes from what is written to WAL.
> (This definition ensures that nothing goes wrong if the page does not
> follow the normal page layout conventions: the transformation is
> lossless no matter what, since we can always reconstruct the exact page
> contents.) The overhead needed is only 2 bytes to show the number of
> bytes removed.
>
> The other alternatives that were suggested included running the page
> contents through the same compressor used for TOAST, and implementing
> a general-purpose run-length compressor that could get rid of runs of
> zeroes anywhere on the page. However, considering that the compression
> work has to be done while holding WALInsertLock, it seems to me there
> is a strong premium on speed. I think that lets out the TOAST
> compressor, which isn't amazingly speedy. (Another objection to the
> TOAST compressor is that it certainly won't win on already-compressed
> toasted data.) A run-length compressor would be reasonably quick but
> I think that the omit-the-middle-hole approach gets most of the possible
> win with even less work. In particular, I think it can be proven that
> omit-the-hole will actually require less CPU than now, since counting
> zero bytes should be strictly faster than CRC'ing bytes, and we'll be
> able to save the CRC work on whatever bytes we omit.
>
> Any objections?


None: completely agree with your analysis. Sounds great.

> It seems we are more or less agreed that 32-bit CRC ought to be enough
> for WAL; and we also need to make a change to ensure that backup blocks
> are positively linked to their parent WAL record, as I noted earlier
> today. So as long as we have to mess with the WAL record format, I was
> wondering what else we could get done in the same change.


Is this a change that would be backpatched as you suggested previously?
It seems a rather large patch to change three things at once. Can the
backpatch wait until 8.1 has gone through beta to allow the changes to
be proven?

Best Regards, Simon Riggs



---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-11-2008, 04:09 AM
Tom Lane
 
Posts: n/a
Default Re: Quick-and-dirty compression for WAL backup blocks

Simon Riggs <simon@2ndquadrant.com> writes:
> Is this a change that would be backpatched as you suggested previously?


I don't think we can backpatch any of these items, since they involve
changes in the on-disk file format. I was thinking of them as CVS HEAD
changes only.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-11-2008, 04:13 AM
Junji TERAMOTO
 
Posts: n/a
Default Re: Quick-and-dirty compression for WAL backup blocks

Hello all,

I am interested in how to "Compress WAL entries".
Then, I study the source now, and read this discussion.

There are some questions.

1.
In the XLogInsert(), it makes two kinds of logs, "whole buffer(page)
log" and "partial buffer log", isn't it? Is it only "who buffer log"
to generate a log with "hole"?

2.
Tom Lane wrote:
> The overhead needed is only 2 bytes to show the number of
> bytes removed.


In "whole buffer log", there is a page header that includes offset of
"hole" (lower and upper). If we use that information, we don't need
any overhead, do we?

# Sorry for my bad english..

--
Junji Teramoto

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-11-2008, 04:14 AM
Tom Lane
 
Posts: n/a
Default Re: Quick-and-dirty compression for WAL backup blocks

Junji TERAMOTO <teramoto.junji@lab.ntt.co.jp> writes:
> In the XLogInsert(), it makes two kinds of logs, "whole buffer(page)
> log" and "partial buffer log", isn't it? Is it only "who buffer log"
> to generate a log with "hole"?


Right.

> Tom Lane wrote:
>> The overhead needed is only 2 bytes to show the number of
>> bytes removed.


> In "whole buffer log", there is a page header that includes offset of
> "hole" (lower and upper). If we use that information, we don't need
> any overhead, do we?


No, because the WAL code cannot assume that all pages follow the
convention that pd_lower and pd_upper represent the boundaries of
free space. (As a counterexample: index metapages don't always
do that.) I think the transformation has to be guaranteed lossless,
which means that at a minimum you'd need to check whether the data
in between pd_lower and pd_upper really is zeroes. So the irreducible
minimum overhead is 1 bit to tell whether you compressed or not.
Considering alignment requirements, you might as well expend a couple
of bytes and keep it simple. (In the patch as committed, I ended up
using 4 bytes --- a uint16 hole start and a uint16 hole length ---
because it kept the code simple. The alignment requirements mean the
extra 2 bytes are usually free anyway.)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-11-2008, 04:14 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: Quick-and-dirty compression for WAL backup blocks

On Mon, 6 Jun 2005, Tom Lane wrote:

> Junji TERAMOTO <teramoto.junji@lab.ntt.co.jp> writes:
>
>> In "whole buffer log", there is a page header that includes offset of
>> "hole" (lower and upper). If we use that information, we don't need
>> any overhead, do we?

>
> No, because the WAL code cannot assume that all pages follow the
> convention that pd_lower and pd_upper represent the boundaries of
> free space. (As a counterexample: index metapages don't always
> do that.) I think the transformation has to be guaranteed lossless,
> which means that at a minimum you'd need to check whether the data
> in between pd_lower and pd_upper really is zeroes. So the irreducible
> minimum overhead is 1 bit to tell whether you compressed or not.


Vacuum doesn't zero out the free space between lower and upper, it's
just marked as unused, so a lossless compression becomes less efficient
on tables that have free space released by vacuum in them.

How about adding a flag to XLogRecData to indicate if the space between
pd_lower and pd_upper is meaningful or not? The XLogInsert caller probably
knows that. That way you could completely skip over the free space if
it's not meaningful, saving even more cycles.

- Heikki

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-11-2008, 04:14 AM
Tom Lane
 
Posts: n/a
Default Re: Quick-and-dirty compression for WAL backup blocks

Heikki Linnakangas <hlinnaka@iki.fi> writes:
> Vacuum doesn't zero out the free space between lower and upper,


It does now ;-)

> How about adding a flag to XLogRecData to indicate if the space between
> pd_lower and pd_upper is meaningful or not? The XLogInsert caller probably
> knows that. That way you could completely skip over the free space if
> it's not meaningful, saving even more cycles.


Hmm ... that might not be a bad idea. As far as I can think offhand,
all the XLogInsert callers know very well what type of page they are
working with, so they would always be able to set such a flag correctly.

Would this be institutionalizing a particular approach to data
compression in the XLogInsert API, though?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-11-2008, 04:14 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: Quick-and-dirty compression for WAL backup blocks

On Mon, 6 Jun 2005, Tom Lane wrote:

> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> Vacuum doesn't zero out the free space between lower and upper,

>
> It does now ;-)


Oh . Does it affect vacuum performance?

>> How about adding a flag to XLogRecData to indicate if the space between
>> pd_lower and pd_upper is meaningful or not? The XLogInsert caller probably
>> knows that. That way you could completely skip over the free space if
>> it's not meaningful, saving even more cycles.

>
> Hmm ... that might not be a bad idea. As far as I can think offhand,
> all the XLogInsert callers know very well what type of page they are
> working with, so they would always be able to set such a flag correctly.
>
> Would this be institutionalizing a particular approach to data
> compression in the XLogInsert API, though?


The "skip the free space" optimization is still useful and worthwhile
even if we have a more sophisticated compression method for the
rest of the page.

- Heikki

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-11-2008, 04:14 AM
Tom Lane
 
Posts: n/a
Default Re: Quick-and-dirty compression for WAL backup blocks

Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On Mon, 6 Jun 2005, Tom Lane wrote:
>> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>>> Vacuum doesn't zero out the free space between lower and upper,

>>
>> It does now ;-)


> Oh . Does it affect vacuum performance?


I haven't tried to measure it ... but certainly it's not totally free.
I'd be happy to rip that change out again.

>> Would this be institutionalizing a particular approach to data
>> compression in the XLogInsert API, though?


> The "skip the free space" optimization is still useful and worthwhile
> even if we have a more sophisticated compression method for the
> rest of the page.


Good point. OK, I'm hacking XLOG stuff now anyway so I'll see about
making that happen.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-11-2008, 04:15 AM
Junji TERAMOTO
 
Posts: n/a
Default Re: Quick-and-dirty compression for WAL backup blocks

Tom Lane wrote:

>>>>In the XLogInsert(), it makes two kinds of logs, "whole buffer(page)
>>>>log" and "partial buffer log", isn't it? Is it only "who buffer
>>>>log"
>>>>to generate a log with "hole"?

>
>>
>> Right.


I see.
I think, it is important to reduce the necessities to write whole pages
to WAL (as TODO list).
# It seems difficult to do so... Compressing WAL is easier way.

>> No, because the WAL code cannot assume that all pages follow the
>> convention that pd_lower and pd_upper represent the boundaries of
>> free space. (As a counterexample: index metapages don't always
>> do that.)


Oh, I forget it.
And I think it is good idea to modify XLogInsert API as CVS, too.

--
Junji Teramoto

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 09:53 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com