This is a discussion on Maintaining cluster order on insert within the Pgsql Patches forums, part of the PostgreSQL category; --> While thinking about index-organized-tables and similar ideas, it occurred to me that there's some low-hanging-fruit: maintaining cluster order on ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| While thinking about index-organized-tables and similar ideas, it occurred to me that there's some low-hanging-fruit: maintaining cluster order on inserts by trying to place new heap tuples close to other similar tuples. That involves asking the index am where on the heap the new tuple should go, and trying to insert it there before using the FSM. Using the new fillfactor parameter makes it more likely that there's room on the page. We don't worry about the order within the page. The API I'm thinking of introduces a new optional index am function, amsuggestblock (suggestions for a better name are welcome). It gets the same parameters as aminsert, and returns the heap block number that would be optimal place to put the new tuple. It's be called from ExecInsert before inserting the heap tuple, and the suggestion is passed on to heap_insert and RelationGetBufferForTuple. I wrote a little patch to implement this for btree, attached. This could be optimized by changing the existing aminsert API, because as it is, an insert will have to descend the btree twice. Once in amsuggestblock and then in aminsert. amsuggestblock could keep the right index page pinned so aminsert could locate it quicker. But I wanted to keep this simple for now. Another improvement might be to allow amsuggestblock to return a list of suggestions, but that makes it more expensive to insert if there isn't room in the suggested pages, since heap_insert will have to try them all before giving up. Comments regarding the general idea or the patch? There should probably be a index option to turn the feature on and off. You'll want to turn it off when you first load a table, and turn it on after CLUSTER to keep it clustered. Since there's been discussion on keeping the TODO list more up-to-date, I hereby officially claim the "Automatically maintain clustering on a table" TODO item reports. And just to be clear, I'm not trying to sneak this into 8.2 anymore, this is 8.3 stuff. I won't be implementing a background daemon described on the TODO item, since that would essentially be an online version of CLUSTER. Which sure would be nice, but that's a different story. - Heikki ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Heikki Linnakangas <heikki@enterprisedb.com> writes: > While thinking about index-organized-tables and similar ideas, it > occurred to me that there's some low-hanging-fruit: maintaining cluster > order on inserts by trying to place new heap tuples close to other > similar tuples. Doesn't this happen for free if there's enough space? UPDATE tries to place the new tuple on the same page it's already on. In practice people are only likely to cluster by primary keys (or other things that seldom change) so I don't particularly agree with inventing a large wart to support "optimally" placing things somewhere else... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| On 8/9/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > UPDATE tries to place the new tuple on the same page it's already > on. I think he meant for INSERT. -- Jonah H. Harris, Software Architect | phone: 732.331.1300 EnterpriseDB Corporation | fax: 732.331.1301 33 Wood Ave S, 2nd Floor | jharris@enterprisedb.com Iselin, New Jersey 08830 | http://www.enterprisedb.com/ ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| I have a table that inserts lots of rows (million+ per day) int8 as primary key, and I cluster by a timestamp which is approximately the timestamp of the insert beforehand and is therefore in increasing order and doesn't change. Most of the rows are updated about 3 times over time roughly within the next 30 minutes. Should I assume that that all of these updates will be on separate pages unless I perform a cluster (which takes a long time) and performance will suffer due to this? Is it possible to preallocate space on the same page for future updates (whatever the average number of updates may be per row) decreasing the number of page loads for querying? Gene On 8/9/06, Jonah H. Harris <jonah.harris@gmail.com> wrote: > > On 8/9/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > UPDATE tries to place the new tuple on the same page it's already > > on. > > I think he meant for INSERT. > > -- > Jonah H. Harris, Software Architect | phone: 732.331.1300 > EnterpriseDB Corporation | fax: 732.331.1301 > 33 Wood Ave S, 2nd Floor | jharris@enterprisedb.com > Iselin, New Jersey 08830 | http://www.enterprisedb.com/ > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > -- Eugene Hart |
| |||
| Gene <genekhart@gmail.com> writes: > I have a table that inserts lots of rows (million+ per day) int8 as primary > key, and I cluster by a timestamp which is approximately the timestamp of > the insert beforehand and is therefore in increasing order and doesn't > change. Most of the rows are updated about 3 times over time roughly within > the next 30 minutes. ISTM you should hardly need to worry about clustering that --- the data will be in timestamp order pretty naturally. The main problem you're going to have is the update-3-times bit. You could keep updated rows on the same page as the original if you ran the table at fillfactor 25% (which you'll be able to do in 8.2) ... but while this might be sane for the leading edge of the table, you hardly want such low storage density in the stable part. You could reduce the fillfactor requirement if you could vacuum the table constantly (every 10 minutes or so) but I assume the table is large enough to make that unattractive. (Eventually we should have a version of vacuum that understands where the dirty stuff is, which might make this approach tenable ... but not in 8.2.) Your best bet might be to partition the table into two subtables, one with "stable" data and one with the fresh data, and transfer rows from one to the other once they get stable. Storage density in the "fresh" part would be poor, but it should be small enough you don't care. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| You are correct the main part I'm worried about is the updates, being so far from the originals. fyi I am partitioning the tables by the timestamp column,vacuum analyzing once per hour, creating one child partition per day in a cron job. Because I'm using hibernate for database abstraction (stateless sessions), I can only have one RULE since having more than one insert rule will not return the correct number of updated rows which confuses hibernate. The one rule just directs inserts to the latest child partition which seems to work well. The reason I'm doing the clustering is I was hoping that with the "stable" non-updating partitions I could execute a CLUSTER at night (slow...) and it would compact the tables into their most efficient state for querying which always involves a date range. bad idea? In this fillfactor feature, will you be able to set it to 100% once you know that no more updates will occur? Or will doing a cluster effectively do this? Will the fill factor only apply for inserts? "Your best bet might be to partition the table into two subtables, one with "stable" data and one with the fresh data, and transfer rows from one to the other once they get stable. Storage density in the "fresh" part would be poor, but it should be small enough you don't care." This sounds interesting, I could create a RULE/INSERT on the unstable table, I will know during the update if it is ready to be put in the stable table. What would be an efficient way to do the transfer? Since the updates occur somewhat randomly, wouldnt the tuples in the stable table then be out of natural timestamp order? thanks for all of your help and comments! it is greatly appreciated! Gene Hart On 8/9/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Gene <genekhart@gmail.com> writes: > > I have a table that inserts lots of rows (million+ per day) int8 as > primary > > key, and I cluster by a timestamp which is approximately the timestamp > of > > the insert beforehand and is therefore in increasing order and doesn't > > change. Most of the rows are updated about 3 times over time roughly > within > > the next 30 minutes. > > ISTM you should hardly need to worry about clustering that --- the data > will be in timestamp order pretty naturally. > > The main problem you're going to have is the update-3-times bit. You > could keep updated rows on the same page as the original if you ran the > table at fillfactor 25% (which you'll be able to do in 8.2) ... but > while this might be sane for the leading edge of the table, you hardly > want such low storage density in the stable part. > > You could reduce the fillfactor requirement if you could vacuum the > table constantly (every 10 minutes or so) but I assume the table is > large enough to make that unattractive. (Eventually we should have > a version of vacuum that understands where the dirty stuff is, which > might make this approach tenable ... but not in 8.2.) > > Your best bet might be to partition the table into two subtables, one > with "stable" data and one with the fresh data, and transfer rows from > one to the other once they get stable. Storage density in the "fresh" > part would be poor, but it should be small enough you don't care. > > regards, tom lane > -- Eugene Hart |
| |||
| Jonah H. Harris wrote: > On 8/9/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> UPDATE tries to place the new tuple on the same page it's already >> on. > > I think he meant for INSERT. > Right. Update is indeed taken care of already. One example where this would help would be a customer_history table that stores all transactions of a customer. Primary key is (customer_id, transaction_id). You would want to keep the table clustered by customer_id to make it quick to retrieve all transactions of a customer. In general, any table with more or less random insert/delete activity that you want to keep in order would benefit. - Heikki ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Tom Lane wrote: > Gene <genekhart@gmail.com> writes: >> I have a table that inserts lots of rows (million+ per day) int8 as primary >> key, and I cluster by a timestamp which is approximately the timestamp of >> the insert... > > ISTM you should hardly need to worry about clustering that --- the data > will be in timestamp order pretty naturally. In my case my biggest/slowest tables are clustered by zip-code (which does a reasonable job at keeping counties/cities/etc on the same pages too). Data comes in constantly (many records per minute, as we ramp up), pretty uniformly across the country; but most queries are geographically bounded. The data's pretty much insert-only. If I understand Heikki's patch, it would help for this use case. > Your best bet might be to partition the table into two subtables, one > with "stable" data and one with the fresh data, and transfer rows from > one to the other once they get stable. Storage density in the "fresh" > part would be poor, but it should be small enough you don't care. Hmm... that should work well for me too. Not sure if the use-case I mentioned above is still compelling anymore; since this seems like it'd give me much of the benefit; and I don't need an excessive fillfactor on the stable part of the table. |
| |||
| Ron Mayer wrote: > In my case my biggest/slowest tables are clustered by zip-code (which > does a reasonable job at keeping counties/cities/etc on the > same pages too). Data comes in constantly (many records per minute, as > we ramp up), pretty uniformly across the country; but most queries > are geographically bounded. The data's pretty much insert-only. No deletes? If the tables grow over time, you probably would need to run CLUSTER every now and then to get the best performance, though the patch would alleviate that quite a lot. Do you have a development environment where you could test what effect the patch would have? It would be interesting to have a real-world use case, since I don't have one myself at the moment. > If I understand Heikki's patch, it would help for this use case. Yes, it would. > > Your best bet might be to partition the table into two subtables, one > > with "stable" data and one with the fresh data, and transfer rows from > > one to the other once they get stable. Storage density in the "fresh" > > part would be poor, but it should be small enough you don't care. > > Hmm... that should work well for me too. Not sure if the use-case > I mentioned above is still compelling anymore; since this seems like > it'd give me much of the benefit; and I don't need an excessive > fillfactor on the stable part of the table. Umm, if your inserts are uniformly distributed across the country, you wouldn't have a stable part, right? - Heikki ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| ||||
| [ Sorry I found this one only found recently.] Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --------------------------------------------------------------------------- Heikki Linnakangas wrote: > While thinking about index-organized-tables and similar ideas, it > occurred to me that there's some low-hanging-fruit: maintaining cluster > order on inserts by trying to place new heap tuples close to other > similar tuples. That involves asking the index am where on the heap the > new tuple should go, and trying to insert it there before using the FSM. > Using the new fillfactor parameter makes it more likely that there's > room on the page. We don't worry about the order within the page. > > The API I'm thinking of introduces a new optional index am function, > amsuggestblock (suggestions for a better name are welcome). It gets the > same parameters as aminsert, and returns the heap block number that > would be optimal place to put the new tuple. It's be called from > ExecInsert before inserting the heap tuple, and the suggestion is passed > on to heap_insert and RelationGetBufferForTuple. > > I wrote a little patch to implement this for btree, attached. > > This could be optimized by changing the existing aminsert API, because > as it is, an insert will have to descend the btree twice. Once in > amsuggestblock and then in aminsert. amsuggestblock could keep the right > index page pinned so aminsert could locate it quicker. But I wanted to > keep this simple for now. Another improvement might be to allow > amsuggestblock to return a list of suggestions, but that makes it more > expensive to insert if there isn't room in the suggested pages, since > heap_insert will have to try them all before giving up. > > Comments regarding the general idea or the patch? There should probably > be a index option to turn the feature on and off. You'll want to turn it > off when you first load a table, and turn it on after CLUSTER to keep it > clustered. > > Since there's been discussion on keeping the TODO list more up-to-date, > I hereby officially claim the "Automatically maintain clustering on a > table" TODO item > reports. And just to be clear, I'm not trying to sneak this into 8.2 > anymore, this is 8.3 stuff. > > I won't be implementing a background daemon described on the TODO item, > since that would essentially be an online version of CLUSTER. Which sure > would be nice, but that's a different story. > > - Heikki > -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |