This is a discussion on Re: Compression and on-disk sorting within the pgsql Hackers forums, part of the PostgreSQL category; --> > > Compressed-filesystem extension (like e2compr, and I think either > Fat or NTFS) can do that. > Windows ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| > > Compressed-filesystem extension (like e2compr, and I think either > Fat or NTFS) can do that. > Windows (NT/2000/XP) can compress individual directories and files under NTFS; new files in a compressed directory are compressed by default. So if the 'spill-to-disk' all happened in its own specific directory, it would be trivial to mark that directory for compression. I don't know enough Linux/Unix to know if it has similar capabilities. ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Bort, Paul wrote: >> Compressed-filesystem extension (like e2compr, and I think either >> Fat or NTFS) can do that. >> >> > > Windows (NT/2000/XP) can compress individual directories and files under > NTFS; new files in a compressed directory are compressed by default. > > So if the 'spill-to-disk' all happened in its own specific directory, it > would be trivial to mark that directory for compression. > > I don't know enough Linux/Unix to know if it has similar capabilities. > > > Or would want to ... I habitually turn off all compression on my Windows boxes, because it's a performance hit in my experience. Disk is cheap ... cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Tue, 2006-05-16 at 11:53 -0400, Andrew Dunstan wrote: > Bort, Paul wrote: > >> Compressed-filesystem extension (like e2compr, and I think either > >> Fat or NTFS) can do that. > >> > >> > > > > Windows (NT/2000/XP) can compress individual directories and files under > > NTFS; new files in a compressed directory are compressed by default. > > > > So if the 'spill-to-disk' all happened in its own specific directory, it > > would be trivial to mark that directory for compression. > > > > I don't know enough Linux/Unix to know if it has similar capabilities. > Or would want to ... > > I habitually turn off all compression on my Windows boxes, because it's > a performance hit in my experience. Disk is cheap ... Disk storage is cheap. Disk bandwidth or throughput is very expensive. -- ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Rod Taylor wrote: >> I habitually turn off all compression on my Windows boxes, because it's >> a performance hit in my experience. Disk is cheap ... >> > > Disk storage is cheap. Disk bandwidth or throughput is very expensive. > Sure, but in my experience using Windows File System compression is not a win here. Presumably if it were an unqualified win they would have it turned on everywhere. The fact that there's an option is a good indication that it isn't in many cases. It is most commonly used for files like executables that are in effect read-only - but that doesn't help us. cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On Tue, May 16, 2006 at 12:27:42PM -0400, Andrew Dunstan wrote: > Rod Taylor wrote: > >>I habitually turn off all compression on my Windows boxes, because it's > >>a performance hit in my experience. Disk is cheap ... > >> > > > >Disk storage is cheap. Disk bandwidth or throughput is very expensive. Hey, that's my line! :P > Sure, but in my experience using Windows File System compression is not > a win here. Presumably if it were an unqualified win they would have it > turned on everywhere. The fact that there's an option is a good > indication that it isn't in many cases. It is most commonly used for > files like executables that are in effect read-only - but that doesn't > help us. The issue with filesystem level compression is that it has to support things like random access, which isn't needed for on-disk sorting (not sure about other things like hashing, etc). In any case, my curiousity is aroused, so I'm currently benchmarking pgbench on both a compressed and uncompressed $PGDATA/base. I'll also do some benchmarks with pg_tmp compressed. Does anyone have time to hack some kind of compression into the on-disk sort code just to get some benchmark numbers? Unfortunately, doing so is beyond my meager C abilitiy... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| On Tue, May 16, 2006 at 12:31:07PM -0500, Jim C. Nasby wrote: > In any case, my curiousity is aroused, so I'm currently benchmarking > pgbench on both a compressed and uncompressed $PGDATA/base. I'll also do > some benchmarks with pg_tmp compressed. Results: http://jim.nasby.net/bench.log As expected, compressing $PGDATA/base was a loss. But compressing pgsql_tmp and then doing some disk-based sorts did show an improvement, from 366.1 seconds to 317.3 seconds, an improvement of 13.3%. This is on a Windows XP laptop (Dell Latitude D600) with 512MB, so it's somewhat of a worst-case scenario. On the other hand, XP's compression algorithm appears to be pretty aggressive, as it cut the size of the on-disk sort file from almost 700MB to 82MB. There's probably gains to be had from a different compression algorithm. > Does anyone have time to hack some kind of compression into the on-disk > sort code just to get some benchmark numbers? Unfortunately, doing so is > beyond my meager C abilitiy... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| On Tue, May 16, 2006 at 12:31:07PM -0500, Jim C. Nasby wrote: > Does anyone have time to hack some kind of compression into the on-disk > sort code just to get some benchmark numbers? Unfortunately, doing so is > beyond my meager C abilitiy... I had a look at this. At first glance it doesn't seem too hard, except the whole logtape process kinda gets in the way. If it wern't for the mark/restore it'd be trivial. Might take a stab at it some time, if I can think of a way to handle the seeking... -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFEakgnIB7bNG8LQkwRAhJPAJ9Mb9oUU4EKJxkLEEv/xujQCIocbgCfQKAD 68M5BsiNGkJt9K/jW+eJlSc= =xEVn -----END PGP SIGNATURE----- |
| |||
| On Tue, May 16, 2006 at 11:46:15PM +0200, Martijn van Oosterhout wrote: > On Tue, May 16, 2006 at 12:31:07PM -0500, Jim C. Nasby wrote: > > Does anyone have time to hack some kind of compression into the on-disk > > sort code just to get some benchmark numbers? Unfortunately, doing so is > > beyond my meager C abilitiy... > > I had a look at this. At first glance it doesn't seem too hard, except > the whole logtape process kinda gets in the way. If it wern't for the > mark/restore it'd be trivial. Might take a stab at it some time, if I > can think of a way to handle the seeking... Oh, do we need to randomly seek? Is that how we switch from one tape to another? It might be easier to switch to giving each tape it's own file... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| On Tue, May 16, 2006 at 04:50:22PM -0500, Jim C. Nasby wrote: > > I had a look at this. At first glance it doesn't seem too hard, except > > the whole logtape process kinda gets in the way. If it wern't for the > > mark/restore it'd be trivial. Might take a stab at it some time, if I > > can think of a way to handle the seeking... > > Oh, do we need to randomly seek? Is that how we switch from one tape to > another? Not seek, mark/restore. As the code describes, sometimes you go back a tuple. The primary reason I think is for the final pass, a merge sort might read the tuples multiple times, so it needs to support it there. > It might be easier to switch to giving each tape it's own file... I don't think it would make much difference. OTOH, if this turns out to be a win, the tuplestore could have the same optimisation. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFEakrqIB7bNG8LQkwRAs7aAJ9qf+djLKfGsP43cRwqnh RnYlT4EACeJGr7 nM4qeYr3lS9BmhZvIq0C3Qw= =DIxE -----END PGP SIGNATURE----- |
| ||||
| Martijn van Oosterhout <kleptog@svana.org> writes: > Not seek, mark/restore. As the code describes, sometimes you go back a > tuple. The primary reason I think is for the final pass, a merge sort > might read the tuples multiple times, so it needs to support it there. However it'd be possible to tell logtape in advance whether a particular tape needs to support that, and only apply compression when not; it would work all the time for intermediate merge passes, and with the recent executor changes to pass down you-need-to-support-mark flags, it'd work for the output pass in a lot of cases too. If you're just trying to get some quick and dirty numbers: do compression, replace Seek/Tell with PANICs, and only test on plain sorts no joins ;-) regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |