Unix Technical Forum

Maintaining cluster order on insert

This is a discussion on Maintaining cluster order on insert within the Pgsql Patches forums, part of the PostgreSQL category; --> While thinking about index-organized-tables and similar ideas, it occurred to me that there's some low-hanging-fruit: maintaining cluster order on ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Patches

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 01:55 AM
Heikki Linnakangas
 
Posts: n/a
Default Maintaining cluster order on insert

While thinking about index-organized-tables and similar ideas, it
occurred to me that there's some low-hanging-fruit: maintaining cluster
order on inserts by trying to place new heap tuples close to other
similar tuples. That involves asking the index am where on the heap the
new tuple should go, and trying to insert it there before using the FSM.
Using the new fillfactor parameter makes it more likely that there's
room on the page. We don't worry about the order within the page.

The API I'm thinking of introduces a new optional index am function,
amsuggestblock (suggestions for a better name are welcome). It gets the
same parameters as aminsert, and returns the heap block number that
would be optimal place to put the new tuple. It's be called from
ExecInsert before inserting the heap tuple, and the suggestion is passed
on to heap_insert and RelationGetBufferForTuple.

I wrote a little patch to implement this for btree, attached.

This could be optimized by changing the existing aminsert API, because
as it is, an insert will have to descend the btree twice. Once in
amsuggestblock and then in aminsert. amsuggestblock could keep the right
index page pinned so aminsert could locate it quicker. But I wanted to
keep this simple for now. Another improvement might be to allow
amsuggestblock to return a list of suggestions, but that makes it more
expensive to insert if there isn't room in the suggested pages, since
heap_insert will have to try them all before giving up.

Comments regarding the general idea or the patch? There should probably
be a index option to turn the feature on and off. You'll want to turn it
off when you first load a table, and turn it on after CLUSTER to keep it
clustered.

Since there's been discussion on keeping the TODO list more up-to-date,
I hereby officially claim the "Automatically maintain clustering on a
table" TODO item . Feel free to bombard me with requests for status
reports. And just to be clear, I'm not trying to sneak this into 8.2
anymore, this is 8.3 stuff.

I won't be implementing a background daemon described on the TODO item,
since that would essentially be an online version of CLUSTER. Which sure
would be nice, but that's a different story.

- Heikki



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-18-2008, 01:55 AM
Tom Lane
 
Posts: n/a
Default Re: Maintaining cluster order on insert

Heikki Linnakangas <heikki@enterprisedb.com> writes:
> While thinking about index-organized-tables and similar ideas, it
> occurred to me that there's some low-hanging-fruit: maintaining cluster
> order on inserts by trying to place new heap tuples close to other
> similar tuples.


Doesn't this happen for free if there's enough space? UPDATE tries to
place the new tuple on the same page it's already on. In practice
people are only likely to cluster by primary keys (or other things that
seldom change) so I don't particularly agree with inventing a large wart
to support "optimally" placing things somewhere else...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-18-2008, 01:55 AM
Jonah H. Harris
 
Posts: n/a
Default Re: Maintaining cluster order on insert

On 8/9/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> UPDATE tries to place the new tuple on the same page it's already
> on.


I think he meant for INSERT.

--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation | fax: 732.331.1301
33 Wood Ave S, 2nd Floor | jharris@enterprisedb.com
Iselin, New Jersey 08830 | http://www.enterprisedb.com/

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-18-2008, 01:55 AM
Gene
 
Posts: n/a
Default Re: [HACKERS] Maintaining cluster order on insert

I have a table that inserts lots of rows (million+ per day) int8 as primary
key, and I cluster by a timestamp which is approximately the timestamp of
the insert beforehand and is therefore in increasing order and doesn't
change. Most of the rows are updated about 3 times over time roughly within
the next 30 minutes. Should I assume that that all of these updates will be
on separate pages unless I perform a cluster (which takes a long time) and
performance will suffer due to this? Is it possible to preallocate space on
the same page for future updates (whatever the average number of updates may
be per row) decreasing the number of page loads for querying?

Gene

On 8/9/06, Jonah H. Harris <jonah.harris@gmail.com> wrote:
>
> On 8/9/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > UPDATE tries to place the new tuple on the same page it's already
> > on.

>
> I think he meant for INSERT.
>
> --
> Jonah H. Harris, Software Architect | phone: 732.331.1300
> EnterpriseDB Corporation | fax: 732.331.1301
> 33 Wood Ave S, 2nd Floor | jharris@enterprisedb.com
> Iselin, New Jersey 08830 | http://www.enterprisedb.com/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>




--
Eugene Hart

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-18-2008, 01:55 AM
Tom Lane
 
Posts: n/a
Default Re: [HACKERS] Maintaining cluster order on insert

Gene <genekhart@gmail.com> writes:
> I have a table that inserts lots of rows (million+ per day) int8 as primary
> key, and I cluster by a timestamp which is approximately the timestamp of
> the insert beforehand and is therefore in increasing order and doesn't
> change. Most of the rows are updated about 3 times over time roughly within
> the next 30 minutes.


ISTM you should hardly need to worry about clustering that --- the data
will be in timestamp order pretty naturally.

The main problem you're going to have is the update-3-times bit. You
could keep updated rows on the same page as the original if you ran the
table at fillfactor 25% (which you'll be able to do in 8.2) ... but
while this might be sane for the leading edge of the table, you hardly
want such low storage density in the stable part.

You could reduce the fillfactor requirement if you could vacuum the
table constantly (every 10 minutes or so) but I assume the table is
large enough to make that unattractive. (Eventually we should have
a version of vacuum that understands where the dirty stuff is, which
might make this approach tenable ... but not in 8.2.)

Your best bet might be to partition the table into two subtables, one
with "stable" data and one with the fresh data, and transfer rows from
one to the other once they get stable. Storage density in the "fresh"
part would be poor, but it should be small enough you don't care.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-18-2008, 01:55 AM
Gene
 
Posts: n/a
Default Re: [HACKERS] Maintaining cluster order on insert

You are correct the main part I'm worried about is the updates, being so far
from the originals. fyi I am partitioning the tables by the timestamp
column,vacuum analyzing once per hour, creating one child partition per day
in a cron job. Because I'm using hibernate for database abstraction
(stateless sessions), I can only have one RULE since having more than one
insert rule will not return the correct number of updated rows which
confuses hibernate. The one rule just directs inserts to the latest child
partition which seems to work well.

The reason I'm doing the clustering is I was hoping that with the "stable"
non-updating partitions I could execute a CLUSTER at night (slow...) and it
would compact the tables into their most efficient state for querying which
always involves a date range. bad idea? In this fillfactor feature, will you
be able to set it to 100% once you know that no more updates will occur? Or
will doing a cluster effectively do this? Will the fill factor only apply
for inserts?

"Your best bet might be to partition the table into two subtables, one
with "stable" data and one with the fresh data, and transfer rows from
one to the other once they get stable. Storage density in the "fresh"
part would be poor, but it should be small enough you don't care."

This sounds interesting, I could create a RULE/INSERT on the unstable table,
I will know during the update if it is ready to be put in the stable table.
What would be an efficient way to do the transfer? Since the updates occur
somewhat randomly, wouldnt the tuples in the stable table then be out of
natural timestamp order?

thanks for all of your help and comments! it is greatly appreciated!
Gene Hart

On 8/9/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Gene <genekhart@gmail.com> writes:
> > I have a table that inserts lots of rows (million+ per day) int8 as

> primary
> > key, and I cluster by a timestamp which is approximately the timestamp

> of
> > the insert beforehand and is therefore in increasing order and doesn't
> > change. Most of the rows are updated about 3 times over time roughly

> within
> > the next 30 minutes.

>
> ISTM you should hardly need to worry about clustering that --- the data
> will be in timestamp order pretty naturally.
>
> The main problem you're going to have is the update-3-times bit. You
> could keep updated rows on the same page as the original if you ran the
> table at fillfactor 25% (which you'll be able to do in 8.2) ... but
> while this might be sane for the leading edge of the table, you hardly
> want such low storage density in the stable part.
>
> You could reduce the fillfactor requirement if you could vacuum the
> table constantly (every 10 minutes or so) but I assume the table is
> large enough to make that unattractive. (Eventually we should have
> a version of vacuum that understands where the dirty stuff is, which
> might make this approach tenable ... but not in 8.2.)
>
> Your best bet might be to partition the table into two subtables, one
> with "stable" data and one with the fresh data, and transfer rows from
> one to the other once they get stable. Storage density in the "fresh"
> part would be poor, but it should be small enough you don't care.
>
> regards, tom lane
>




--
Eugene Hart

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-18-2008, 01:55 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: Maintaining cluster order on insert

Jonah H. Harris wrote:
> On 8/9/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> UPDATE tries to place the new tuple on the same page it's already
>> on.

>
> I think he meant for INSERT.
>


Right. Update is indeed taken care of already.

One example where this would help would be a customer_history table that
stores all transactions of a customer. Primary key is (customer_id,
transaction_id). You would want to keep the table clustered by
customer_id to make it quick to retrieve all transactions of a customer.
In general, any table with more or less random insert/delete activity
that you want to keep in order would benefit.

- Heikki


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-18-2008, 01:55 AM
Ron Mayer
 
Posts: n/a
Default Re: [HACKERS] Maintaining cluster order on insert

Tom Lane wrote:
> Gene <genekhart@gmail.com> writes:
>> I have a table that inserts lots of rows (million+ per day) int8 as primary
>> key, and I cluster by a timestamp which is approximately the timestamp of
>> the insert...

>
> ISTM you should hardly need to worry about clustering that --- the data
> will be in timestamp order pretty naturally.


In my case my biggest/slowest tables are clustered by zip-code (which
does a reasonable job at keeping counties/cities/etc on the
same pages too). Data comes in constantly (many records per minute, as
we ramp up), pretty uniformly across the country; but most queries
are geographically bounded. The data's pretty much insert-only.

If I understand Heikki's patch, it would help for this use case.


> Your best bet might be to partition the table into two subtables, one
> with "stable" data and one with the fresh data, and transfer rows from
> one to the other once they get stable. Storage density in the "fresh"
> part would be poor, but it should be small enough you don't care.


Hmm... that should work well for me too. Not sure if the use-case
I mentioned above is still compelling anymore; since this seems like
it'd give me much of the benefit; and I don't need an excessive
fillfactor on the stable part of the table.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-18-2008, 01:55 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: [HACKERS] Maintaining cluster order on insert

Ron Mayer wrote:
> In my case my biggest/slowest tables are clustered by zip-code (which
> does a reasonable job at keeping counties/cities/etc on the
> same pages too). Data comes in constantly (many records per minute, as
> we ramp up), pretty uniformly across the country; but most queries
> are geographically bounded. The data's pretty much insert-only.


No deletes? If the tables grow over time, you probably would need to run
CLUSTER every now and then to get the best performance, though the patch
would alleviate that quite a lot.

Do you have a development environment where you could test what effect
the patch would have? It would be interesting to have a real-world use
case, since I don't have one myself at the moment.

> If I understand Heikki's patch, it would help for this use case.


Yes, it would.

> > Your best bet might be to partition the table into two subtables, one
> > with "stable" data and one with the fresh data, and transfer rows from
> > one to the other once they get stable. Storage density in the "fresh"
> > part would be poor, but it should be small enough you don't care.

>
> Hmm... that should work well for me too. Not sure if the use-case
> I mentioned above is still compelling anymore; since this seems like
> it'd give me much of the benefit; and I don't need an excessive
> fillfactor on the stable part of the table.


Umm, if your inserts are uniformly distributed across the country, you
wouldn't have a stable part, right?

- Heikki

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-18-2008, 11:04 AM
Bruce Momjian
 
Posts: n/a
Default Re: Maintaining cluster order on insert


[ Sorry I found this one only found recently.]

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------


Heikki Linnakangas wrote:
> While thinking about index-organized-tables and similar ideas, it
> occurred to me that there's some low-hanging-fruit: maintaining cluster
> order on inserts by trying to place new heap tuples close to other
> similar tuples. That involves asking the index am where on the heap the
> new tuple should go, and trying to insert it there before using the FSM.
> Using the new fillfactor parameter makes it more likely that there's
> room on the page. We don't worry about the order within the page.
>
> The API I'm thinking of introduces a new optional index am function,
> amsuggestblock (suggestions for a better name are welcome). It gets the
> same parameters as aminsert, and returns the heap block number that
> would be optimal place to put the new tuple. It's be called from
> ExecInsert before inserting the heap tuple, and the suggestion is passed
> on to heap_insert and RelationGetBufferForTuple.
>
> I wrote a little patch to implement this for btree, attached.
>
> This could be optimized by changing the existing aminsert API, because
> as it is, an insert will have to descend the btree twice. Once in
> amsuggestblock and then in aminsert. amsuggestblock could keep the right
> index page pinned so aminsert could locate it quicker. But I wanted to
> keep this simple for now. Another improvement might be to allow
> amsuggestblock to return a list of suggestions, but that makes it more
> expensive to insert if there isn't room in the suggested pages, since
> heap_insert will have to try them all before giving up.
>
> Comments regarding the general idea or the patch? There should probably
> be a index option to turn the feature on and off. You'll want to turn it
> off when you first load a table, and turn it on after CLUSTER to keep it
> clustered.
>
> Since there's been discussion on keeping the TODO list more up-to-date,
> I hereby officially claim the "Automatically maintain clustering on a
> table" TODO item . Feel free to bombard me with requests for status
> reports. And just to be clear, I'm not trying to sneak this into 8.2
> anymore, this is 8.3 stuff.
>
> I won't be implementing a background daemon described on the TODO item,
> since that would essentially be an online version of CLUSTER. Which sure
> would be nice, but that's a different story.
>
> - Heikki
>



--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 04:28 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com