vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi We currently have a 16CPU 32GB box running postgres 8.2. When I do a pg_dump with the following parameters "/usr/bin/pg_dump -E UTF8 -F c -b" I get a file of 14GB in size. But the database is 110GB in size on the disk. Why the big difference in size? Does this have anything to do with performance? -- Adrian Moisey Systems Administrator | CareerJunction | Your Future Starts Here. Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| |||
| Hi Adrian, >When I do a pg_dump with the following parameters "/usr/bin/pg_dump -E UTF8 -F c -b" I get a file of 14GB in size. >From the man page of pg_dump " -F format, --format=format Selects the format of the output. format can be one of the following: c output a custom archive suitable for input into pg_restore. This is the most flexible format in that it allows reordering of data load as well as schema elements. This format is also compressed by default. " The output is compressed and it is a dump of the database which contain the SQL commands: >But the database is 110GB in size on the disk. Why the big difference >in size? Does this have anything to do with performance? VACUUM or VACUUM FULL of the entire database will reduce the size of the database by reclaiming any unused space and you can use the filesystem based backup or backup/restore strategy. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| |||
| Adrian Moisey wrote: > Hi > > We currently have a 16CPU 32GB box running postgres 8.2. > > When I do a pg_dump with the following parameters "/usr/bin/pg_dump -E > UTF8 -F c -b" I get a file of 14GB in size. > > But the database is 110GB in size on the disk. Why the big difference > in size? Does this have anything to do with performance? Reasons: You're using a compact format designed to limit size and provide fast dump/restore. The database, by contrast, is designed for fast access. The database can contain "dead space" that hasn't been reclaimed by a VACUUM. It can also have space allocated that it doesn't need, which you can reclaim with VACUUM FULL. This dead space can really add up, but it's the price of fast updates, inserts and deletes. Your indexes take up disk space in the database, but are not dumped and do not take up space in the dump file. Indexes can get very large especially if you have lots of multi-column indexes. I'm told that under certain loads indexes can grow full of mostly empty pages, and a REINDEX every now and then can be useful to shrink them - see "\h reindex" in psql. That won't affect your dump sizes as indexes aren't dumped, but will affect the database size. You can examine index (and relation) sizes using a query like: select * from pg_class order by relpages desc Data in the database is either not compressed, or (for larger fields) is compressed with an algorithm that's very fast but doesn't achieve high levels of compression. By contrast, the dumps are quite efficiently compressed. One of my database clusters is 571MB on disk at the moment, just after being dropped, recreated, and populated from another data source. The repopulation process is quite complex. I found that running VACUUM FULL followed by REINDEX DATABASE dbname knocked 50MB off the database size, pushing it down to 521MB. That's on a basically brand new DB. Note, however, that 130MB of that space is in pg_xlog, and much of it will be wasted as the DB has been under very light load but uses large xlogs because it needs to perform well under huge load spikes. The size of the `base' directory (the "real data", indexes, etc) is only 392MB. If I dump that database using the same options you dumped yours with, I end up with a hilariously small 29MB dump file. That's less than 10% of the size of the main DB. The difference will be entirely due to compression, a more compact storage layout in the dump files, and to the lack of index data in the dumps. The database has quite a few indexes, some of which are multicolumn indexes on tables with large numbers of tuples, so that bloats the "live" version a lot. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| |||
| > Hi > > We currently have a 16CPU 32GB box running postgres 8.2. > > When I do a pg_dump with the following parameters "/usr/bin/pg_dump -E > UTF8 -F c -b" I get a file of 14GB in size. > > But the database is 110GB in size on the disk. Why the big difference > in size? Does this have anything to do with performance? I have a 2GB database, which dumps to a 340 MB file... Two reasons : - I have lots of big fat but very necessary indexes (not included in dump) - Dump is compressed with gzip which really works well on database data. If you suspect your tables or indexes are bloated, restore your dump to a test box. Use fsync=off during restore, you don't care about integrity on the test box. This will avoid slowing down your production database. Then look at the size of the restored database. If it is much smaller than your production database, then you have bloat. Time to CLUSTER, or REINDEX, or VACUUM FULL (your choice), on the tables that are bloated, and take note to vacuum those more often (and perhaps tune the autovacuum). Judicious use of CLUSTER on that small, but extremely often updated table can also be a very good option. 8.3 and its new HOT feature are also a good idea. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| |||
| Hi > If you suspect your tables or indexes are bloated, restore your dump > to a test box. > Use fsync=off during restore, you don't care about integrity on the > test box. > This will avoid slowing down your production database. > Then look at the size of the restored database. > If it is much smaller than your production database, then you have > bloat. I have done that, and I get the following: the live one is 113G the restored one is 78G How should I get rid of the bloat? VACUUM FULL? -- Adrian Moisey Systems Administrator | CareerJunction | Your Future Starts Here. Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| |||
| > >> If you suspect your tables or indexes are bloated, restore your >> dump to a test box. >> Use fsync=off during restore, you don't care about integrity on the >> test box. >> This will avoid slowing down your production database. >> Then look at the size of the restored database. >> If it is much smaller than your production database, then you have >> bloat. > > I have done that, and I get the following: > > the live one is 113G > the restored one is 78G Ah. Good news for you is that you know that you can do something Now, is the bloat in the tables (which tables ?) or in the indexes (which indexes ?), or in the toast tables perhaps, or in the system catalogs or all of the above ? Or perhaps there is a long-forgotten process that got zombified while holding a huge temp table ? (not very likely, but who knows). Use pg_relation_size() and its friends to get an idea of the size of stuff. Perhaps you have 1 extremely bloated table or index, or perhaps everything is bloated. The solution to your problem depends on which case you have. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| |||
| Adrian Moisey wrote: > Hi > >> If you suspect your tables or indexes are bloated, restore your >> dump to a test box. >> Use fsync=off during restore, you don't care about integrity on >> the test box. >> This will avoid slowing down your production database. >> Then look at the size of the restored database. >> If it is much smaller than your production database, then you have >> bloat. > > I have done that, and I get the following: > > the live one is 113G > the restored one is 78G > > How should I get rid of the bloat? > VACUUM FULL? And/or REINDEX if you're not satisfied with the results of a VACUUM FULL. http://www.postgresql.org/docs/8.3/i...ve/vacuum.html http://www.postgresql.org/docs/8.3/i...l-reindex.html Of course, all of these will have performance consequences while they're running, and take out locks that prevent certain other operatons as shown in table 13-2: http://www.postgresql.org/docs/8.3/s...t-locking.html and the explanation following it. Note in particular: ---- ACCESS EXCLUSIVE Conflicts with locks of all modes (ACCESS SHARE, ROW SHARE, ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE). This mode guarantees that the holder is the only transaction accessing the table in any way. Acquired by the ALTER TABLE, DROP TABLE, TRUNCATE, REINDEX, CLUSTER, and VACUUM FULL commands. This is also the default lock mode for LOCK TABLE statements that do not specify a mode explicitly. Tip: Only an ACCESS EXCLUSIVE lock blocks a SELECT (without FOR UPDATE/SHARE) statement. ---- In other words, you won't be doing much with a table/index while a VACUUM FULL or a REINDEX is in progress on it. Given that, you probably want to check your table/index sizes and see if there are particular problem tables or indexes, rather than just using a sledgehammer approach. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| |||
| Hi >> the live one is 113G >> the restored one is 78G > > Good news for you is that you know that you can do something Will this help with performance ? > Now, is the bloat in the tables (which tables ?) or in the indexes > (which indexes ?), or in the toast tables perhaps, or in the system > catalogs or all of the above ? Or perhaps there is a long-forgotten > process that got zombified while holding a huge temp table ? (not very > likely, but who knows). > Use pg_relation_size() and its friends to get an idea of the size of > stuff. I'll look into that, thanks -- Adrian Moisey Systems Administrator | CareerJunction | Your Future Starts Here. Web: www.careerjunction.co.za | Email: adrian@careerjunction.co.za Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| |||
| > Will this help with performance ? Depends if the bloat is in part of your working set. If debloating can make the working set fit in RAM, or lower your IOs, you'll get a boost. >> Now, is the bloat in the tables (which tables ?) or in the indexes >> (which indexes ?), or in the toast tables perhaps, or in the system >> catalogs or all of the above ? Or perhaps there is a long-forgotten >> process that got zombified while holding a huge temp table ? (not very >> likely, but who knows). >> Use pg_relation_size() and its friends to get an idea of the size >> of stuff. > > I'll look into that, thanks > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| ||||
| In response to Adrian Moisey <adrian@careerjunction.co.za>: > > We currently have a 16CPU 32GB box running postgres 8.2. > > When I do a pg_dump with the following parameters "/usr/bin/pg_dump -E > UTF8 -F c -b" I get a file of 14GB in size. > > But the database is 110GB in size on the disk. Why the big difference > in size? Does this have anything to do with performance? In a dump, indexes are a single command. In the actual database, the indexes actually contain all the data the indexes require, which can be substantially more in size than the command to create the index. Additionally, a running database has a certain amount of wasted space. If you're running vacuum on a proper schedule, this won't get out of hand. Read this page to understand better: http://www.postgresql.org/docs/8.1/s...intenance.html And lastly, I expect that the pg_dump format is able to do more aggressive compression than the running database. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |