Unix Technical Forum

Index bloat of 4x

This is a discussion on Index bloat of 4x within the Pgsql General forums, part of the PostgreSQL category; --> We just did a bunch of maintenance on one of our production databases that involved a lot of alter ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql General

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-09-2008, 01:25 PM
Bill Moran
 
Posts: n/a
Default Index bloat of 4x


We just did a bunch of maintenance on one of our production databases that
involved a lot of alter tables and moving records about and the like.

Afterwards, I did a vacuum full and analyze to get the database back on
track -- autovac maintains it under normal operations.

Today I decided to run reindex during a slow period, and was shocked to
find the database size drop from 165M to 30M. Keep in mind that the
165M is after vacuum full. So, apparently, there was 135M of index bloat?
That seems a little excessive to me, especially when the docs claim that
reindexing is usually not necessary.

This is PostgreSQL 8.1.4. We've got upgrades to 8.2 planned, but it's
going to be a few months before we can squeak that into a maintenance
window. Additionally, I thought all the big index improvements were
added in 7.4.

I guess my question is whether or not this is expected. It's obviously
not a good thing -- I've noticed that shared buffer usage has dropped
dramatically as well (from 28,000 to 7000). I hadn't expected index
bloat of this magnitude, and I'm concerned about when the database hits
2 or 3 G in size and has 12G just in indexes that take hours to rebuild.

--
Bill Moran
Collaborative Fusion Inc.

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-09-2008, 01:26 PM
Alvaro Herrera
 
Posts: n/a
Default Re: Index bloat of 4x

Bill Moran wrote:
>
> We just did a bunch of maintenance on one of our production databases that
> involved a lot of alter tables and moving records about and the like.
>
> Afterwards, I did a vacuum full and analyze to get the database back on
> track -- autovac maintains it under normal operations.
>
> Today I decided to run reindex during a slow period, and was shocked to
> find the database size drop from 165M to 30M. Keep in mind that the
> 165M is after vacuum full. So, apparently, there was 135M of index bloat?
> That seems a little excessive to me, especially when the docs claim that
> reindexing is usually not necessary.


It's been said that vacuum full does not fix index bloat -- in fact,
it's a problem it worsens. However, I very much doubt that it would be
this serious. I guess the question is, how large was the index *before*
all the alter tables?

I'd expect that it was the ALTER TABLEs that caused this much index
growth, which VACUUM FULL was subsequently unable to fix.

I don't expect you kept a log of index sizes throughout the operation
however :-(

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-09-2008, 01:26 PM
Bill Moran
 
Posts: n/a
Default Re: Index bloat of 4x

In response to Alvaro Herrera <alvherre@commandprompt.com>:

> Bill Moran wrote:
> >
> > We just did a bunch of maintenance on one of our production databases that
> > involved a lot of alter tables and moving records about and the like.
> >
> > Afterwards, I did a vacuum full and analyze to get the database back on
> > track -- autovac maintains it under normal operations.
> >
> > Today I decided to run reindex during a slow period, and was shocked to
> > find the database size drop from 165M to 30M. Keep in mind that the
> > 165M is after vacuum full. So, apparently, there was 135M of index bloat?
> > That seems a little excessive to me, especially when the docs claim that
> > reindexing is usually not necessary.

>
> It's been said that vacuum full does not fix index bloat -- in fact,
> it's a problem it worsens. However, I very much doubt that it would be
> this serious. I guess the question is, how large was the index *before*
> all the alter tables?


I don't have details on the various indexes. I do keep an mrtg graph of
pg_database_size(), so I can track the overall size of the database and
correlate it to events. I'm not tracking individual relations, indexes,
etc though.

The entire database was around 28M prior to the upgrades, etc. Immediately
after the upgrades, it was ~270M. Following a vacuum full, it dropped to
165M. Following a database-wide reindex, it dropped to 30M.

> I'd expect that it was the ALTER TABLEs that caused this much index
> growth, which VACUUM FULL was subsequently unable to fix.
>
> I don't expect you kept a log of index sizes throughout the operation
> however :-(


Not index size, specifically, no.

I can probably reproduce the issue, however. I have access to the scripts
that were run to update the database, and I can pull a pre-upgrade version
from backup.

I guess my question is whether or not this behaviour is strange enough to
warrant me taking the time to do so. Just because I've never seen it
before doesn't mean that it's unheard of.

Is this level of index bloat known? Would it be worthwhile for me to
investigate it and report any details on what's going on or is this a known
factor that folks don't need any additional details on?

--
Bill Moran
Collaborative Fusion Inc.

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-09-2008, 01:26 PM
Greg Sabino Mullane
 
Posts: n/a
Default Re: Index bloat of 4x


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> Today I decided to run reindex during a slow period, and was shocked to
> find the database size drop from 165M to 30M. Keep in mind that the
> 165M is after vacuum full. So, apparently, there was 135M of index bloat?
> That seems a little excessive to me, especially when the docs claim that
> reindexing is usually not necessary.


For what its worth, I've seen far worse.

> I guess my question is whether or not this is expected. It's obviously
> not a good thing -- I've noticed that shared buffer usage has dropped
> dramatically as well (from 28,000 to 7000). I hadn't expected index
> bloat of this magnitude, and I'm concerned about when the database hits
> 2 or 3 G in size and has 12G just in indexes that take hours to rebuild.


Regular reindexing is so inexpensive compared to vacuum, I recommend
adding it in as part of your regular maintenance. At the very least, it's
unlikely to ever be that severe again unless you don't reindex for an
equally long period of time.

Come to think of it, an auto-reindex option might be nice in core someday.
TODO item?

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200701171129
http://biglumber.com/x/web?pk=2529DF...9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFFrlDivJuQZxSWSsgRAvNnAJ9fJ+U6cyyO382HiZtp8L E5drcpOgCgwlW5
EbOS7Gbg/DYOgXeG7vUIlhY=
=9E8g
-----END PGP SIGNATURE-----



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-09-2008, 01:26 PM
Csaba Nagy
 
Posts: n/a
Default Re: Index bloat of 4x

[snip]
> Come to think of it, an auto-reindex option might be nice in core someday.
> TODO item?


Marry it with autovacuum + online index build, and it will be cool ;-)

BTW, having a privileged background thread doing the reindex could be a
solution to most of the objections regarding online reindex, as the
thread would be privileged enough already to be able to back out if
something fails (part of the objections), and it could stay around long
enough to only lock tentatively in a loop in order to avoid deadlocks
(another part of the objections).

If it would be also marked the same as the vacuum threads not to block
other vacuums, then it would also not be a problem that it runs 2
days...

Cheers,
Csaba.




---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-09-2008, 01:26 PM
Tom Lane
 
Posts: n/a
Default Re: Index bloat of 4x

Bill Moran <wmoran@collaborativefusion.com> writes:
> The entire database was around 28M prior to the upgrades, etc. Immediately
> after the upgrades, it was ~270M. Following a vacuum full, it dropped to
> 165M. Following a database-wide reindex, it dropped to 30M.


As Alvaro said, vacuum full doesn't shrink indexes but in fact bloats them.
(Worst case, they could double in size, if the vacuum moves every row;
there's an intermediate state where there have to be index entries for
both old and new copies of each moved row, to ensure things are
consistent if the vacuum crashes right there.)

So the above doesn't sound too unlikely. Perhaps we should recommend
vac full + reindex as standard cleanup procedure. Longer term, maybe
teach vac full to do an automatic reindex if it's moved more than X% of
the rows. Or forget the current vac full implementation entirely, and
go over to something acting more like CLUSTER ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-09-2008, 01:26 PM
Alvaro Herrera
 
Posts: n/a
Default Re: Index bloat of 4x

Bill Moran wrote:

> The entire database was around 28M prior to the upgrades, etc. Immediately
> after the upgrades, it was ~270M. Following a vacuum full, it dropped to
> 165M. Following a database-wide reindex, it dropped to 30M.


Oh, so it was clearly the upgrade procedure that caused the bloat ...
Reindexing seems the expected course.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-09-2008, 01:26 PM
Bill Moran
 
Posts: n/a
Default Re: Index bloat of 4x

In response to Alvaro Herrera <alvherre@commandprompt.com>:

> Bill Moran wrote:
>
> > The entire database was around 28M prior to the upgrades, etc. Immediately
> > after the upgrades, it was ~270M. Following a vacuum full, it dropped to
> > 165M. Following a database-wide reindex, it dropped to 30M.

>
> Oh, so it was clearly the upgrade procedure that caused the bloat ...
> Reindexing seems the expected course.


Right. Sorry if I didn't explain that properly.

It wasn't the fact that it bloated that surprised me. It was the
_magnitude_ of bloat that I wasn't expecting, as well as the fact that
it was _all_ _index_ bloat.

--
Bill Moran
Collaborative Fusion Inc.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-09-2008, 01:26 PM
Bill Moran
 
Posts: n/a
Default Re: Index bloat of 4x

In response to Ben <bench@silentmedia.com>:

> Hey Bill. How do you monitor your shared buffer usage? My understanding
> was that there wasn't a good way to see what was used vs. allocated.


echo "select count(*) from pg_buffercache where reldatabase is not null;" | $PSQL_BIN -P tuples_only -U pgsql postgres | head -1

Of course, you have to install the pg_buffercache contrib module first.

> On Wed, 17 Jan 2007, Bill Moran wrote:
>
> >
> > We just did a bunch of maintenance on one of our production databases that
> > involved a lot of alter tables and moving records about and the like.
> >
> > Afterwards, I did a vacuum full and analyze to get the database back on
> > track -- autovac maintains it under normal operations.
> >
> > Today I decided to run reindex during a slow period, and was shocked to
> > find the database size drop from 165M to 30M. Keep in mind that the
> > 165M is after vacuum full. So, apparently, there was 135M of index bloat?
> > That seems a little excessive to me, especially when the docs claim that
> > reindexing is usually not necessary.
> >
> > This is PostgreSQL 8.1.4. We've got upgrades to 8.2 planned, but it's
> > going to be a few months before we can squeak that into a maintenance
> > window. Additionally, I thought all the big index improvements were
> > added in 7.4.
> >
> > I guess my question is whether or not this is expected. It's obviously
> > not a good thing -- I've noticed that shared buffer usage has dropped
> > dramatically as well (from 28,000 to 7000). I hadn't expected index
> > bloat of this magnitude, and I'm concerned about when the database hits
> > 2 or 3 G in size and has 12G just in indexes that take hours to rebuild.
> >
> > --
> > Bill Moran
> > Collaborative Fusion Inc.
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 6: explain analyze is your friend
> >

>
>
>
>
>
>



--
Bill Moran
Collaborative Fusion Inc.

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

************************************************** **************
IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.
************************************************** **************

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-09-2008, 01:26 PM
Stefan Kaltenbrunner
 
Posts: n/a
Default Re: Index bloat of 4x

Tom Lane wrote:
> Bill Moran <wmoran@collaborativefusion.com> writes:
>> The entire database was around 28M prior to the upgrades, etc. Immediately
>> after the upgrades, it was ~270M. Following a vacuum full, it dropped to
>> 165M. Following a database-wide reindex, it dropped to 30M.

>
> As Alvaro said, vacuum full doesn't shrink indexes but in fact bloats them.
> (Worst case, they could double in size, if the vacuum moves every row;
> there's an intermediate state where there have to be index entries for
> both old and new copies of each moved row, to ensure things are
> consistent if the vacuum crashes right there.)
>
> So the above doesn't sound too unlikely. Perhaps we should recommend
> vac full + reindex as standard cleanup procedure. Longer term, maybe
> teach vac full to do an automatic reindex if it's moved more than X% of
> the rows. Or forget the current vac full implementation entirely, and
> go over to something acting more like CLUSTER ...


we have to recommend the CLUSTER "way" to fix overly bloated databases
quite often to people on IRC because vacuum full is unreasonably slow on
highly fragmented databases.
Doing something like that internally for vacuum full sounds like a
reasonable idea except for the additional disk usage during the process
which might cause issues for people ...



Stefan

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:37 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com