Unix Technical Forum

[PATCH] Compression and on-disk sorting

This is a discussion on [PATCH] Compression and on-disk sorting within the Pgsql Patches forums, part of the PostgreSQL category; --> Persuant to the discussions currently on -hackers, here's a patch that uses zlib to compress the tapes as they ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Patches

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 01:38 AM
Martijn van Oosterhout
 
Posts: n/a
Default [PATCH] Compression and on-disk sorting

Persuant to the discussions currently on -hackers, here's a patch that
uses zlib to compress the tapes as they go to disk. I default to the
compression level 3 (think gzip -3).

Please speed test all you like, I *think* it's bug free, but you never
know.

Outstanding questions:

- I use zlib because the builtin pg_lzcompress can't do what zlib does.
Here we setup input and output buffers and zlib will process as much as
it can (input empty or output full). This means no marshalling is
required. We can compress the whole file without having it in memory.

- zlib allocates memory for compression and decompression, I don't know
how much. However, it allocates via the postgres mcxt system so it
shouldn't too hard to find out. Simon pointed out that we'll need to
track this because we might allow hundreds of tapes.

- Each tape is compressed as one long compressed stream. Currently no
seeking is allowed, so only sorts, no joins! (As tom said, quick and
dirty numbers). This should show this possibility in its best light
but if we want to support seeking we're going to need to change that.
Maybe no compression on the last pass?

- It's probable that the benefits are strongly correlated to the speed
of your disk subsystem. We need to measure this effect. I can't
accuratly measure this because my compiler doesn't inline any of the
functions in tuplesort.c.

In my test of a compression ratio around 100-to-1, on 160MB of data
with tiny work_mem on my 5 year old laptop, it speeds it up by 60% so
it's obviously not a complete waste of time. Ofcourse, YMMV

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFEa0yaIB7bNG8LQkwRAuXRAJ9ro9LrWINV8g1evul4WE 5t0w4C9QCdF4Y5
zYtnVndgtUJ2ExXOJ+APdNk=
=8H+A
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-18-2008, 01:38 AM
Simon Riggs
 
Posts: n/a
Default Re: [PATCH] Compression and on-disk sorting

On Wed, 2006-05-17 at 18:17 +0200, Martijn van Oosterhout wrote:
> Persuant to the discussions currently on -hackers, here's a patch that
> uses zlib to compress the tapes as they go to disk. I default to the
> compression level 3 (think gzip -3).
>
> Please speed test all you like, I *think* it's bug free, but you never
> know.
>
> Outstanding questions:
>
> - I use zlib because the builtin pg_lzcompress can't do what zlib does.
> Here we setup input and output buffers and zlib will process as much as
> it can (input empty or output full). This means no marshalling is
> required. We can compress the whole file without having it in memory.


Licence is BSD-compatible and it works the way we need it to work.

> - Each tape is compressed as one long compressed stream. Currently no
> seeking is allowed, so only sorts, no joins! (As tom said, quick and
> dirty numbers). This should show this possibility in its best light
> but if we want to support seeking we're going to need to change that.
> Maybe no compression on the last pass?


We should be able to do this without significant loss of compression by
redefining the lts block size to be 32k. That's the size of the
look-back window anyhow, so compressing the whole stream doesn't get us
much more.

> - It's probable that the benefits are strongly correlated to the speed
> of your disk subsystem. We need to measure this effect. I can't
> accuratly measure this because my compiler doesn't inline any of the
> functions in tuplesort.c.


Please make sure any tests have trace_sort = on.

> In my test of a compression ratio around 100-to-1, on 160MB of data
> with tiny work_mem on my 5 year old laptop, it speeds it up by 60% so
> it's obviously not a complete waste of time. Ofcourse, YMMV


Sounds good. Well done.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-18-2008, 01:38 AM
Martijn van Oosterhout
 
Posts: n/a
Default Re: [PATCH] Compression and on-disk sorting

On Wed, May 17, 2006 at 06:38:47PM +0100, Simon Riggs wrote:
> > - Each tape is compressed as one long compressed stream. Currently no
> > seeking is allowed, so only sorts, no joins! (As tom said, quick and
> > dirty numbers). This should show this possibility in its best light
> > but if we want to support seeking we're going to need to change that.
> > Maybe no compression on the last pass?

>
> We should be able to do this without significant loss of compression by
> redefining the lts block size to be 32k. That's the size of the
> look-back window anyhow, so compressing the whole stream doesn't get us
> much more.


The major problem is looking back costs significantly more with
compression. If you need to look back into the previous compressed
block, you need to decompress the whole previous block. The simple
solution would be to keep a buffer of the last 32KB. Another posibility
would be to have a limit of 32KB of uncompressed data per block and
just remember the whole previous block.

Seek/Tell is not the hard part, it's the backspace. It would probably
be smart to make backspace call Seek, rather than trying to be smart
about it.

Another issue is that currently the compression code is completely
within logtape.c. To be able to seek backwards efficiently you might
have to change the abstraction so that it knows about the records from
tuplesort.c. That's much more work, which needs a lot more thinking.

Besides, we still havn't got any reports yet that this actually
provides a benefit on any machine less than five years ago. Anyone out
there doing tests?

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFEbDDHIB7bNG8LQkwRAr5+AJ90Sd5yqfnWCtVy+UXwqU kSaqC9WQCfVNke
6LBbFANQ5xI8Qk3OX46vcX8=
=cyzM
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-18-2008, 01:38 AM
Simon Riggs
 
Posts: n/a
Default Re: [PATCH] Compression and on-disk sorting

On Thu, 2006-05-18 at 10:31 +0200, Martijn van Oosterhout wrote:
> On Wed, May 17, 2006 at 06:38:47PM +0100, Simon Riggs wrote:
> > > - Each tape is compressed as one long compressed stream. Currently no
> > > seeking is allowed, so only sorts, no joins! (As tom said, quick and
> > > dirty numbers). This should show this possibility in its best light
> > > but if we want to support seeking we're going to need to change that.
> > > Maybe no compression on the last pass?

> >
> > We should be able to do this without significant loss of compression by
> > redefining the lts block size to be 32k. That's the size of the
> > look-back window anyhow, so compressing the whole stream doesn't get us
> > much more.

>
> The major problem is looking back costs significantly more with
> compression. If you need to look back into the previous compressed
> block, you need to decompress the whole previous block. The simple
> solution would be to keep a buffer of the last 32KB. Another posibility
> would be to have a limit of 32KB of uncompressed data per block and
> just remember the whole previous block.


Just do a Z_FULL_FLUSH when you hit end of block. That way all blocks
will be independent of each other and you can rewind as much as you
like. We can choose the block size to be 32KB or even 64KB, there's no
dependency there, just memory allocation. It should be pretty simple to
make the block size variable at run time, so we can select it according
to how many files and how much memory we have.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-18-2008, 01:38 AM
Martijn van Oosterhout
 
Posts: n/a
Default Re: [PATCH] Compression and on-disk sorting

On Thu, May 18, 2006 at 11:34:36AM +0100, Simon Riggs wrote:
> Just do a Z_FULL_FLUSH when you hit end of block. That way all blocks
> will be independent of each other and you can rewind as much as you
> like. We can choose the block size to be 32KB or even 64KB, there's no
> dependency there, just memory allocation. It should be pretty simple to
> make the block size variable at run time, so we can select it according
> to how many files and how much memory we have.


If you know you don't need to seek, there's no need to block the data
at all, one long stream is fine. So that case is easy.

For seeking, you need more work. I assume you're talking about 32KB
input block sizes (uncompressed). The output blocks will be of variable
size. These compressed blocks would be divided up into fixed 8K blocks
and written to disk.

To allow seeking, you'd have to do something like a header comtaining:

- length of previous compressed block
- length of this compressed block
- offset of block in uncompressed bytes (from beginning of tape)

This would allow you to scan backwards and forwards. If you want to be
able to jump to anywhere in the file, you may be better off storing the
file offsets (which would be implicit if the blocks are 32KB) in the
indirect blocks, using a search to find the right block, and then a
header in the block to find the offset.

Still, I'd like some evidence of benefits before writing up something
like that.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFEbF2JIB7bNG8LQkwRAm43AJ9eXAnTKNdNj5+KsewYBa XxuYlz1ACgj8k/
3B1GQjLXnF1kbEyvRcjl6GI=
=xl4Q
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-18-2008, 01:38 AM
Jim C. Nasby
 
Posts: n/a
Default Re: [PATCH] Compression and on-disk sorting

On Thu, May 18, 2006 at 10:31:03AM +0200, Martijn van Oosterhout wrote:
> Besides, we still havn't got any reports yet that this actually
> provides a benefit on any machine less than five years ago. Anyone out
> there doing tests?


Yes. I'm compiling the patched binaries right now, but the baseline
testing I've got so far is at
http://jim.nasby.net/misc/compress_sort.txt.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-18-2008, 01:38 AM
Simon Riggs
 
Posts: n/a
Default Re: [PATCH] Compression and on-disk sorting

On Thu, 2006-05-18 at 17:10 -0500, Jim C. Nasby wrote:
> On Thu, May 18, 2006 at 10:31:03AM +0200, Martijn van Oosterhout wrote:
> > Besides, we still havn't got any reports yet that this actually
> > provides a benefit on any machine less than five years ago. Anyone out
> > there doing tests?

>
> Yes. I'm compiling the patched binaries right now, but the baseline
> testing I've got so far is at
> http://jim.nasby.net/misc/compress_sort.txt.


Looks a very good improvement. Well done Martijn/Jim.

The next question is: does it apply in all cases?

We need to test "SELECT aid from accounts" also, or some other scenarios
where the data is as uncompressible as possible. We should also try this
on a table where the rows have been inserted by different transactions,
so that the xmin value isn't the same for all tuples. We need to see if
there are cases where this causes a performance regression rather than
gain.

We still need to examine memory usage. Jim's testing so far is done on
already sorted data, so only uses 1 out of 715 tapes. If we did utilise
a much larger number of tapes, we could face difficulties with the
memory used during decompression.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-18-2008, 01:39 AM
Martijn van Oosterhout
 
Posts: n/a
Default Re: [PATCH] Compression and on-disk sorting

On Fri, May 19, 2006 at 12:43:05PM +0100, Simon Riggs wrote:
> We need to test "SELECT aid from accounts" also, or some other scenarios
> where the data is as uncompressible as possible. We should also try this
> on a table where the rows have been inserted by different transactions,
> so that the xmin value isn't the same for all tuples. We need to see if
> there are cases where this causes a performance regression rather than
> gain.


AFAIK, the xmin is not included in the sort. The only thing is maybe
the ctid which is used in updates. Actually repeating the above tests
but doing:

select xmin,xmax,cmin,cmax,ctid,* from <blah>

Would be interesting. Even that would be compressable though.

Thinking about it, we're storing and compressing HeapTuples right.
There are a few fields there that would compress really well.

t_tableOid
t_len (if not vairable length fields)
t_natts

Even t_hoff and the bitmask if the patterns of NULLs don't vary much
between rows.

> We still need to examine memory usage. Jim's testing so far is done on
> already sorted data, so only uses 1 out of 715 tapes. If we did utilise
> a much larger number of tapes, we could face difficulties with the
> memory used during decompression.


I'm going to see if I can make some changes to track maximum memory
usage per tape.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFEbbL0IB7bNG8LQkwRAoP3AJ9In0xS1q2fl7Ywr2IaTN +ZW8hcYwCfUq61
fAB/QHVF7EPRWm62xcEMP80=
=K6SS
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 04:52 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com