This is a discussion on Index bloat of 4x within the Pgsql General forums, part of the PostgreSQL category; --> We just did a bunch of maintenance on one of our production databases that involved a lot of alter ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| We just did a bunch of maintenance on one of our production databases that involved a lot of alter tables and moving records about and the like. Afterwards, I did a vacuum full and analyze to get the database back on track -- autovac maintains it under normal operations. Today I decided to run reindex during a slow period, and was shocked to find the database size drop from 165M to 30M. Keep in mind that the 165M is after vacuum full. So, apparently, there was 135M of index bloat? That seems a little excessive to me, especially when the docs claim that reindexing is usually not necessary. This is PostgreSQL 8.1.4. We've got upgrades to 8.2 planned, but it's going to be a few months before we can squeak that into a maintenance window. Additionally, I thought all the big index improvements were added in 7.4. I guess my question is whether or not this is expected. It's obviously not a good thing -- I've noticed that shared buffer usage has dropped dramatically as well (from 28,000 to 7000). I hadn't expected index bloat of this magnitude, and I'm concerned about when the database hits 2 or 3 G in size and has 12G just in indexes that take hours to rebuild. -- Bill Moran Collaborative Fusion Inc. ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Bill Moran wrote: > > We just did a bunch of maintenance on one of our production databases that > involved a lot of alter tables and moving records about and the like. > > Afterwards, I did a vacuum full and analyze to get the database back on > track -- autovac maintains it under normal operations. > > Today I decided to run reindex during a slow period, and was shocked to > find the database size drop from 165M to 30M. Keep in mind that the > 165M is after vacuum full. So, apparently, there was 135M of index bloat? > That seems a little excessive to me, especially when the docs claim that > reindexing is usually not necessary. It's been said that vacuum full does not fix index bloat -- in fact, it's a problem it worsens. However, I very much doubt that it would be this serious. I guess the question is, how large was the index *before* all the alter tables? I'd expect that it was the ALTER TABLEs that caused this much index growth, which VACUUM FULL was subsequently unable to fix. I don't expect you kept a log of index sizes throughout the operation however :-( -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| In response to Alvaro Herrera <alvherre@commandprompt.com>: > Bill Moran wrote: > > > > We just did a bunch of maintenance on one of our production databases that > > involved a lot of alter tables and moving records about and the like. > > > > Afterwards, I did a vacuum full and analyze to get the database back on > > track -- autovac maintains it under normal operations. > > > > Today I decided to run reindex during a slow period, and was shocked to > > find the database size drop from 165M to 30M. Keep in mind that the > > 165M is after vacuum full. So, apparently, there was 135M of index bloat? > > That seems a little excessive to me, especially when the docs claim that > > reindexing is usually not necessary. > > It's been said that vacuum full does not fix index bloat -- in fact, > it's a problem it worsens. However, I very much doubt that it would be > this serious. I guess the question is, how large was the index *before* > all the alter tables? I don't have details on the various indexes. I do keep an mrtg graph of pg_database_size(), so I can track the overall size of the database and correlate it to events. I'm not tracking individual relations, indexes, etc though. The entire database was around 28M prior to the upgrades, etc. Immediately after the upgrades, it was ~270M. Following a vacuum full, it dropped to 165M. Following a database-wide reindex, it dropped to 30M. > I'd expect that it was the ALTER TABLEs that caused this much index > growth, which VACUUM FULL was subsequently unable to fix. > > I don't expect you kept a log of index sizes throughout the operation > however :-( Not index size, specifically, no. I can probably reproduce the issue, however. I have access to the scripts that were run to update the database, and I can pull a pre-upgrade version from backup. I guess my question is whether or not this behaviour is strange enough to warrant me taking the time to do so. Just because I've never seen it before doesn't mean that it's unheard of. Is this level of index bloat known? Would it be worthwhile for me to investigate it and report any details on what's going on or is this a known factor that folks don't need any additional details on? -- Bill Moran Collaborative Fusion Inc. ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Today I decided to run reindex during a slow period, and was shocked to > find the database size drop from 165M to 30M. Keep in mind that the > 165M is after vacuum full. So, apparently, there was 135M of index bloat? > That seems a little excessive to me, especially when the docs claim that > reindexing is usually not necessary. For what its worth, I've seen far worse. > I guess my question is whether or not this is expected. It's obviously > not a good thing -- I've noticed that shared buffer usage has dropped > dramatically as well (from 28,000 to 7000). I hadn't expected index > bloat of this magnitude, and I'm concerned about when the database hits > 2 or 3 G in size and has 12G just in indexes that take hours to rebuild. Regular reindexing is so inexpensive compared to vacuum, I recommend adding it in as part of your regular maintenance. At the very least, it's unlikely to ever be that severe again unless you don't reindex for an equally long period of time. Come to think of it, an auto-reindex option might be nice in core someday. TODO item? - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200701171129 http://biglumber.com/x/web?pk=2529DF...9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFFrlDivJuQZxSWSsgRAvNnAJ9fJ+U6cyyO382HiZtp8L E5drcpOgCgwlW5 EbOS7Gbg/DYOgXeG7vUIlhY= =9E8g -----END PGP SIGNATURE----- ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| [snip] > Come to think of it, an auto-reindex option might be nice in core someday. > TODO item? Marry it with autovacuum + online index build, and it will be cool ;-) BTW, having a privileged background thread doing the reindex could be a solution to most of the objections regarding online reindex, as the thread would be privileged enough already to be able to back out if something fails (part of the objections), and it could stay around long enough to only lock tentatively in a loop in order to avoid deadlocks (another part of the objections). If it would be also marked the same as the vacuum threads not to block other vacuums, then it would also not be a problem that it runs 2 days... Cheers, Csaba. ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Bill Moran <wmoran@collaborativefusion.com> writes: > The entire database was around 28M prior to the upgrades, etc. Immediately > after the upgrades, it was ~270M. Following a vacuum full, it dropped to > 165M. Following a database-wide reindex, it dropped to 30M. As Alvaro said, vacuum full doesn't shrink indexes but in fact bloats them. (Worst case, they could double in size, if the vacuum moves every row; there's an intermediate state where there have to be index entries for both old and new copies of each moved row, to ensure things are consistent if the vacuum crashes right there.) So the above doesn't sound too unlikely. Perhaps we should recommend vac full + reindex as standard cleanup procedure. Longer term, maybe teach vac full to do an automatic reindex if it's moved more than X% of the rows. Or forget the current vac full implementation entirely, and go over to something acting more like CLUSTER ... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Bill Moran wrote: > The entire database was around 28M prior to the upgrades, etc. Immediately > after the upgrades, it was ~270M. Following a vacuum full, it dropped to > 165M. Following a database-wide reindex, it dropped to 30M. Oh, so it was clearly the upgrade procedure that caused the bloat ... Reindexing seems the expected course. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| In response to Alvaro Herrera <alvherre@commandprompt.com>: > Bill Moran wrote: > > > The entire database was around 28M prior to the upgrades, etc. Immediately > > after the upgrades, it was ~270M. Following a vacuum full, it dropped to > > 165M. Following a database-wide reindex, it dropped to 30M. > > Oh, so it was clearly the upgrade procedure that caused the bloat ... > Reindexing seems the expected course. Right. Sorry if I didn't explain that properly. It wasn't the fact that it bloated that surprised me. It was the _magnitude_ of bloat that I wasn't expecting, as well as the fact that it was _all_ _index_ bloat. -- Bill Moran Collaborative Fusion Inc. ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| In response to Ben <bench@silentmedia.com>: > Hey Bill. How do you monitor your shared buffer usage? My understanding > was that there wasn't a good way to see what was used vs. allocated. echo "select count(*) from pg_buffercache where reldatabase is not null;" | $PSQL_BIN -P tuples_only -U pgsql postgres | head -1 Of course, you have to install the pg_buffercache contrib module first. > On Wed, 17 Jan 2007, Bill Moran wrote: > > > > > We just did a bunch of maintenance on one of our production databases that > > involved a lot of alter tables and moving records about and the like. > > > > Afterwards, I did a vacuum full and analyze to get the database back on > > track -- autovac maintains it under normal operations. > > > > Today I decided to run reindex during a slow period, and was shocked to > > find the database size drop from 165M to 30M. Keep in mind that the > > 165M is after vacuum full. So, apparently, there was 135M of index bloat? > > That seems a little excessive to me, especially when the docs claim that > > reindexing is usually not necessary. > > > > This is PostgreSQL 8.1.4. We've got upgrades to 8.2 planned, but it's > > going to be a few months before we can squeak that into a maintenance > > window. Additionally, I thought all the big index improvements were > > added in 7.4. > > > > I guess my question is whether or not this is expected. It's obviously > > not a good thing -- I've noticed that shared buffer usage has dropped > > dramatically as well (from 28,000 to 7000). I hadn't expected index > > bloat of this magnitude, and I'm concerned about when the database hits > > 2 or 3 G in size and has 12G just in indexes that take hours to rebuild. > > > > -- > > Bill Moran > > Collaborative Fusion Inc. > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 6: explain analyze is your friend > > > > > > > > -- Bill Moran Collaborative Fusion Inc. wmoran@collaborativefusion.com Phone: 412-422-3463x4023 ************************************************** ************** IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ************************************************** ************** ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| ||||
| Tom Lane wrote: > Bill Moran <wmoran@collaborativefusion.com> writes: >> The entire database was around 28M prior to the upgrades, etc. Immediately >> after the upgrades, it was ~270M. Following a vacuum full, it dropped to >> 165M. Following a database-wide reindex, it dropped to 30M. > > As Alvaro said, vacuum full doesn't shrink indexes but in fact bloats them. > (Worst case, they could double in size, if the vacuum moves every row; > there's an intermediate state where there have to be index entries for > both old and new copies of each moved row, to ensure things are > consistent if the vacuum crashes right there.) > > So the above doesn't sound too unlikely. Perhaps we should recommend > vac full + reindex as standard cleanup procedure. Longer term, maybe > teach vac full to do an automatic reindex if it's moved more than X% of > the rows. Or forget the current vac full implementation entirely, and > go over to something acting more like CLUSTER ... we have to recommend the CLUSTER "way" to fix overly bloated databases quite often to people on IRC because vacuum full is unreasonably slow on highly fragmented databases. Doing something like that internally for vacuum full sounds like a reasonable idea except for the additional disk usage during the process which might cause issues for people ... Stefan ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |