Unix Technical Forum

Re: Compression and on-disk sorting

This is a discussion on Re: Compression and on-disk sorting within the pgsql Hackers forums, part of the PostgreSQL category; --> > > Compressed-filesystem extension (like e2compr, and I think either > Fat or NTFS) can do that. > Windows ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 03:23 AM
Bort, Paul
 
Posts: n/a
Default Re: Compression and on-disk sorting

>
> Compressed-filesystem extension (like e2compr, and I think either
> Fat or NTFS) can do that.
>


Windows (NT/2000/XP) can compress individual directories and files under
NTFS; new files in a compressed directory are compressed by default.

So if the 'spill-to-disk' all happened in its own specific directory, it
would be trivial to mark that directory for compression.

I don't know enough Linux/Unix to know if it has similar capabilities.


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-12-2008, 03:23 AM
Andrew Dunstan
 
Posts: n/a
Default Re: Compression and on-disk sorting

Bort, Paul wrote:
>> Compressed-filesystem extension (like e2compr, and I think either
>> Fat or NTFS) can do that.
>>
>>

>
> Windows (NT/2000/XP) can compress individual directories and files under
> NTFS; new files in a compressed directory are compressed by default.
>
> So if the 'spill-to-disk' all happened in its own specific directory, it
> would be trivial to mark that directory for compression.
>
> I don't know enough Linux/Unix to know if it has similar capabilities.
>
>
>


Or would want to ...

I habitually turn off all compression on my Windows boxes, because it's
a performance hit in my experience. Disk is cheap ...

cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-12-2008, 03:23 AM
Rod Taylor
 
Posts: n/a
Default Re: Compression and on-disk sorting

On Tue, 2006-05-16 at 11:53 -0400, Andrew Dunstan wrote:
> Bort, Paul wrote:
> >> Compressed-filesystem extension (like e2compr, and I think either
> >> Fat or NTFS) can do that.
> >>
> >>

> >
> > Windows (NT/2000/XP) can compress individual directories and files under
> > NTFS; new files in a compressed directory are compressed by default.
> >
> > So if the 'spill-to-disk' all happened in its own specific directory, it
> > would be trivial to mark that directory for compression.
> >
> > I don't know enough Linux/Unix to know if it has similar capabilities.


> Or would want to ...
>
> I habitually turn off all compression on my Windows boxes, because it's
> a performance hit in my experience. Disk is cheap ...


Disk storage is cheap. Disk bandwidth or throughput is very expensive.
--


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-12-2008, 03:23 AM
Andrew Dunstan
 
Posts: n/a
Default Re: Compression and on-disk sorting

Rod Taylor wrote:
>> I habitually turn off all compression on my Windows boxes, because it's
>> a performance hit in my experience. Disk is cheap ...
>>

>
> Disk storage is cheap. Disk bandwidth or throughput is very expensive.
>


Sure, but in my experience using Windows File System compression is not
a win here. Presumably if it were an unqualified win they would have it
turned on everywhere. The fact that there's an option is a good
indication that it isn't in many cases. It is most commonly used for
files like executables that are in effect read-only - but that doesn't
help us.

cheers

andrew


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-12-2008, 03:23 AM
Jim C. Nasby
 
Posts: n/a
Default Re: Compression and on-disk sorting

On Tue, May 16, 2006 at 12:27:42PM -0400, Andrew Dunstan wrote:
> Rod Taylor wrote:
> >>I habitually turn off all compression on my Windows boxes, because it's
> >>a performance hit in my experience. Disk is cheap ...
> >>

> >
> >Disk storage is cheap. Disk bandwidth or throughput is very expensive.


Hey, that's my line! :P

> Sure, but in my experience using Windows File System compression is not
> a win here. Presumably if it were an unqualified win they would have it
> turned on everywhere. The fact that there's an option is a good
> indication that it isn't in many cases. It is most commonly used for
> files like executables that are in effect read-only - but that doesn't
> help us.


The issue with filesystem level compression is that it has to support
things like random access, which isn't needed for on-disk sorting (not
sure about other things like hashing, etc).

In any case, my curiousity is aroused, so I'm currently benchmarking
pgbench on both a compressed and uncompressed $PGDATA/base. I'll also do
some benchmarks with pg_tmp compressed.

Does anyone have time to hack some kind of compression into the on-disk
sort code just to get some benchmark numbers? Unfortunately, doing so is
beyond my meager C abilitiy...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-12-2008, 03:23 AM
Jim C. Nasby
 
Posts: n/a
Default Re: Compression and on-disk sorting

On Tue, May 16, 2006 at 12:31:07PM -0500, Jim C. Nasby wrote:
> In any case, my curiousity is aroused, so I'm currently benchmarking
> pgbench on both a compressed and uncompressed $PGDATA/base. I'll also do
> some benchmarks with pg_tmp compressed.


Results: http://jim.nasby.net/bench.log

As expected, compressing $PGDATA/base was a loss. But compressing
pgsql_tmp and then doing some disk-based sorts did show an improvement,
from 366.1 seconds to 317.3 seconds, an improvement of 13.3%. This is on
a Windows XP laptop (Dell Latitude D600) with 512MB, so it's somewhat of
a worst-case scenario. On the other hand, XP's compression algorithm
appears to be pretty aggressive, as it cut the size of the on-disk sort
file from almost 700MB to 82MB. There's probably gains to be had from a
different compression algorithm.

> Does anyone have time to hack some kind of compression into the on-disk
> sort code just to get some benchmark numbers? Unfortunately, doing so is
> beyond my meager C abilitiy...

--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-12-2008, 03:24 AM
Martijn van Oosterhout
 
Posts: n/a
Default Re: Compression and on-disk sorting

On Tue, May 16, 2006 at 12:31:07PM -0500, Jim C. Nasby wrote:
> Does anyone have time to hack some kind of compression into the on-disk
> sort code just to get some benchmark numbers? Unfortunately, doing so is
> beyond my meager C abilitiy...


I had a look at this. At first glance it doesn't seem too hard, except
the whole logtape process kinda gets in the way. If it wern't for the
mark/restore it'd be trivial. Might take a stab at it some time, if I
can think of a way to handle the seeking...

--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFEakgnIB7bNG8LQkwRAhJPAJ9Mb9oUU4EKJxkLEEv/xujQCIocbgCfQKAD
68M5BsiNGkJt9K/jW+eJlSc=
=xEVn
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-12-2008, 03:24 AM
Jim C. Nasby
 
Posts: n/a
Default Re: Compression and on-disk sorting

On Tue, May 16, 2006 at 11:46:15PM +0200, Martijn van Oosterhout wrote:
> On Tue, May 16, 2006 at 12:31:07PM -0500, Jim C. Nasby wrote:
> > Does anyone have time to hack some kind of compression into the on-disk
> > sort code just to get some benchmark numbers? Unfortunately, doing so is
> > beyond my meager C abilitiy...

>
> I had a look at this. At first glance it doesn't seem too hard, except
> the whole logtape process kinda gets in the way. If it wern't for the
> mark/restore it'd be trivial. Might take a stab at it some time, if I
> can think of a way to handle the seeking...


Oh, do we need to randomly seek? Is that how we switch from one tape to
another?

It might be easier to switch to giving each tape it's own file...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-12-2008, 03:24 AM
Martijn van Oosterhout
 
Posts: n/a
Default Re: Compression and on-disk sorting

On Tue, May 16, 2006 at 04:50:22PM -0500, Jim C. Nasby wrote:
> > I had a look at this. At first glance it doesn't seem too hard, except
> > the whole logtape process kinda gets in the way. If it wern't for the
> > mark/restore it'd be trivial. Might take a stab at it some time, if I
> > can think of a way to handle the seeking...

>
> Oh, do we need to randomly seek? Is that how we switch from one tape to
> another?


Not seek, mark/restore. As the code describes, sometimes you go back a
tuple. The primary reason I think is for the final pass, a merge sort
might read the tuples multiple times, so it needs to support it there.

> It might be easier to switch to giving each tape it's own file...


I don't think it would make much difference. OTOH, if this turns out to
be a win, the tuplestore could have the same optimisation.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFEakrqIB7bNG8LQkwRAs7aAJ9qf+djLKfGsP43cRwqnh RnYlT4EACeJGr7
nM4qeYr3lS9BmhZvIq0C3Qw=
=DIxE
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-12-2008, 03:24 AM
Tom Lane
 
Posts: n/a
Default Re: Compression and on-disk sorting

Martijn van Oosterhout <kleptog@svana.org> writes:
> Not seek, mark/restore. As the code describes, sometimes you go back a
> tuple. The primary reason I think is for the final pass, a merge sort
> might read the tuples multiple times, so it needs to support it there.


However it'd be possible to tell logtape in advance whether a particular
tape needs to support that, and only apply compression when not; it
would work all the time for intermediate merge passes, and with the
recent executor changes to pass down you-need-to-support-mark flags,
it'd work for the output pass in a lot of cases too.

If you're just trying to get some quick and dirty numbers: do
compression, replace Seek/Tell with PANICs, and only test on plain
sorts no joins ;-)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 01:44 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com