This is a discussion on Re: Compression and on-disk sorting within the pgsql Hackers forums, part of the PostgreSQL category; --> > 1) Use n sort areas for n tapes making everything purely sequential access. Some time ago testing I ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| > 1) Use n sort areas for n tapes making everything purely sequential access. Some time ago testing I did has shown, that iff the IO block size is large enough (256k) it does not really matter that much if the blocks are at random locations. I think that is still true for current model disks. So unless we parallelize, it is imho sufficient to see to it that we write (and read) large enough blocks with single calls. This also has no problem in highly concurrent scenarios, where you do not have enough spindles. Andreas ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| On Thu, May 18, 2006 at 10:57:16AM +0200, Zeugswetter Andreas DCP SD wrote: > > > 1) Use n sort areas for n tapes making everything purely sequential > access. > > Some time ago testing I did has shown, that iff the IO block size is > large enough > (256k) it does not really matter that much if the blocks are at random > locations. > I think that is still true for current model disks. > > So unless we parallelize, it is imho sufficient to see to it that we > write > (and read) large enough blocks with single calls. This also has no > problem in > highly concurrent scenarios, where you do not have enough spindles. AFAIK logtape currently reads in much less than 256k blocks. Of course if you get lucky you'll read from one tape for some time before switching to another, which should have a sort-of similar effect if the drives aren't very busy with other things. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Thu, May 18, 2006 at 11:22:46AM -0500, Jim C. Nasby wrote: > AFAIK logtape currently reads in much less than 256k blocks. Of course > if you get lucky you'll read from one tape for some time before > switching to another, which should have a sort-of similar effect if the > drives aren't very busy with other things. Logtape works with BLCKSZ blocks. Whether it's advisable or not, I don't know. One thing I'm noticing with this compression-in-logtape is that the memory cost per tape increases considerably. Currently we rely heavily on the OS to cache for us. Lets say we buffer 256KB per tape, and we assume a large sort with many tapes, you're going to blow all your work_mem on buffers. If using compression uses more memory so that you can't have as many tapes and thus need multiple passes, well, we need to test if this is still a win. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFEbL2qIB7bNG8LQkwRAonQAJ44O5XiZiTvDtOv8IX6kP erd7uRiwCfYf0B w7PTCc8OQUK7+4dHOg3IoPg= =OvRw -----END PGP SIGNATURE----- |
| |||
| On Thu, May 18, 2006 at 08:32:10PM +0200, Martijn van Oosterhout wrote: > On Thu, May 18, 2006 at 11:22:46AM -0500, Jim C. Nasby wrote: > > AFAIK logtape currently reads in much less than 256k blocks. Of course > > if you get lucky you'll read from one tape for some time before > > switching to another, which should have a sort-of similar effect if the > > drives aren't very busy with other things. > > Logtape works with BLCKSZ blocks. Whether it's advisable or not, I > don't know. One thing I'm noticing with this compression-in-logtape is > that the memory cost per tape increases considerably. Currently we rely > heavily on the OS to cache for us. > > Lets say we buffer 256KB per tape, and we assume a large sort with many > tapes, you're going to blow all your work_mem on buffers. If using > compression uses more memory so that you can't have as many tapes and > thus need multiple passes, well, we need to test if this is still a > win. So you're sticking with 8K blocks on disk, after compression? And then decompressing blocks as they come in? Actually, I guess the amount of memory used for zlib's lookback buffer (or whatever they call it) could be pretty substantial, and I'm not sure if there would be a way to combine that across all tapes. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| "Jim C. Nasby" <jnasby@pervasive.com> writes: > Actually, I guess the amount of memory used for zlib's lookback buffer > (or whatever they call it) could be pretty substantial, and I'm not sure > if there would be a way to combine that across all tapes. But there's only one active write tape at a time. My recollection of zlib is that compression is memory-hungry but decompression not so much, so it seems like this shouldn't be a huge deal. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Thu, May 18, 2006 at 04:55:17PM -0400, Tom Lane wrote: > "Jim C. Nasby" <jnasby@pervasive.com> writes: > > Actually, I guess the amount of memory used for zlib's lookback buffer > > (or whatever they call it) could be pretty substantial, and I'm not sure > > if there would be a way to combine that across all tapes. > > But there's only one active write tape at a time. My recollection of > zlib is that compression is memory-hungry but decompression not so much, > so it seems like this shouldn't be a huge deal. It seems more appropriate to discuss results here, rather than on -patches... http://jim.nasby.net/misc/compress_sort.txt is preliminary results. I've run into a slight problem in that even at a compression level of -3, zlib is cutting the on-disk size of sorts by 25x. So my pgbench sort test with scale=150 that was producing a 2G on-disk sort is now producing a 80M sort, which obviously fits in memory. And cuts sort times by more than half. So, if nothing else, it looks like compression is definately a win if it means you can now fit the sort within the disk cache. While that doesn't sound like something very worthwhile, it essentially extends work_mem from a fraction of memory to up to ~25x memory. I'm currently loading up a pgbench database with a scaling factor of 15000; hopefully I'll have results from that testing in the morning. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On Thu, May 18, 2006 at 10:02:44PM -0500, Jim C. Nasby wrote: > http://jim.nasby.net/misc/compress_sort.txt is preliminary results. > I've run into a slight problem in that even at a compression level of > -3, zlib is cutting the on-disk size of sorts by 25x. So my pgbench sort > test with scale=150 that was producing a 2G on-disk sort is now > producing a 80M sort, which obviously fits in memory. And cuts sort > times by more than half. I'm seeing 250,000 blocks being cut down to 9,500 blocks. That's almost unbeleiveable. What's in the table? It would seem to imply that our tuple format is far more compressable than we expected. Do you have any stats on CPU usage? Memory usage? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFEbXO/IB7bNG8LQkwRAoJ9AJ9cS41D8o3GbBp4SSDHfP0wjMDYmACbBG pW b2Y+QwWjXrih/ybK4dbE3nY= =6wPW -----END PGP SIGNATURE----- |
| |||
| Martijn van Oosterhout <kleptog@svana.org> writes: > I'm seeing 250,000 blocks being cut down to 9,500 blocks. That's almost > unbeleiveable. What's in the table? Yeah, I'd tend to question the test data being used. gzip does not do that well on typical text (especially not at the lower settings we'd likely want to use). regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| On Fri, May 19, 2006 at 09:03:31AM -0400, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > I'm seeing 250,000 blocks being cut down to 9,500 blocks. That's almost > > unbeleiveable. What's in the table? > > Yeah, I'd tend to question the test data being used. gzip does not do > that well on typical text (especially not at the lower settings we'd > likely want to use). However, postgres tables are very highly compressable, 10-to-1 is not that uncommon. pg_proc and pg_index compress by that for example. Indexes compress even more (a few on my system compress 25-to-1 but that could just be slack space, the record being 37-to-1 (pg_constraint_conname_nsp_index)). The only table on my test system over 32KB that doesn't reach 2-to-1 compression with gzip -3 is one of the toast tables. So getting 25-to-1 is a lot, but possibly not that extreme. pg_statistic, which is about as close to random data as you're going to get on a postgres system, compresses 5-to-1. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFEbch3IB7bNG8LQkwRAgEIAJ9lw1HBQw8SAXT6Fb9cMu f4Kq7QTQCfb6BW QcsaviFB8gToA+ARDHXQRnk= =97Hl -----END PGP SIGNATURE----- |
| ||||
| Martijn van Oosterhout <kleptog@svana.org> writes: > However, postgres tables are very highly compressable, 10-to-1 is not > that uncommon. pg_proc and pg_index compress by that for example. > Indexes compress even more (a few on my system compress 25-to-1 but > that could just be slack space, the record being 37-to-1 > (pg_constraint_conname_nsp_index)). Anything containing a column of type "name" will compress amazingly well because of all the padding spaces. I don't think that's representative of user data though ... except maybe for the occasional novice using "char(255)" ... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |