vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Heikki Linnakangas <heikki@enterprisedb.com> writes: > The patch also adds support for candidate matches. An index scan can > indicate that the tuples it's returning are candidates, and the executor > will recheck the original scan quals of any candidate matches when the > tuple is fetched from heap. This will not work, unless we change the planner --- the original quals aren't necessarily there in some corner cases (partial indexes, if memory serves). > The motivation for adding the support for candidate matches is that GIT > / clustered indexes need it. You need more than a vague reference to an unapplied patch to convince me we ought to do this. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: >> The patch also adds support for candidate matches. An index scan can >> indicate that the tuples it's returning are candidates, and the executor >> will recheck the original scan quals of any candidate matches when the >> tuple is fetched from heap. > > This will not work, unless we change the planner --- the original quals > aren't necessarily there in some corner cases (partial indexes, if > memory serves). This is only for bitmap scans, which *do* always have the original quals available in the executor (BitmapHeapScanState.bitmapqualorig). That's because we have to recheck the original conditions when the bitmap goes lossy. To support candidate matches with the amgettuple API, that'll need to be changed as well. And that will indeed involve more executor changes. >> The motivation for adding the support for candidate matches is that GIT >> / clustered indexes need it. > > You need more than a vague reference to an unapplied patch to convince > me we ought to do this. With the unapplied GIT patch, the index doesn't store the index key of every tuple. That has the consequence that when scanning, we get a bunch of tids to a heap page, we know that some of the might match, but we don't know which ones until the tuples are fetched from heap. In a more distant future, range-encoded bitmap indexes will also produce candidate matches. And as I mentioned, this is immediately useful when doing bitmap ANDs large enough to go lossy. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Heikki Linnakangas <heikki@enterprisedb.com> writes: > Tom Lane wrote: >> This will not work, unless we change the planner --- the original quals >> aren't necessarily there in some corner cases (partial indexes, if >> memory serves). > This is only for bitmap scans, which *do* always have the original quals > available in the executor (BitmapHeapScanState.bitmapqualorig). > That's because we have to recheck the original conditions when the > bitmap goes lossy. Yeah, but the index AM has to support regular indexscans too, and those are not prepared for runtime lossiness determination; nor am I particularly willing to add that. > With the unapplied GIT patch, the index doesn't store the index key of > every tuple. I thought the design was to eliminate *duplicate* keys from the index. Not to lose data. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: >> Tom Lane wrote: >>> This will not work, unless we change the planner --- the original quals >>> aren't necessarily there in some corner cases (partial indexes, if >>> memory serves). > >> This is only for bitmap scans, which *do* always have the original quals >> available in the executor (BitmapHeapScanState.bitmapqualorig). >> That's because we have to recheck the original conditions when the >> bitmap goes lossy. > > Yeah, but the index AM has to support regular indexscans too, and those > are not prepared for runtime lossiness determination; nor am I > particularly willing to add that. Well, do you have an alternative suggestion? >> With the unapplied GIT patch, the index doesn't store the index key of >> every tuple. > > I thought the design was to eliminate *duplicate* keys from the index. > Not to lose data. The idea *isn't* to deal efficiently with duplicate keys. The bitmap indexam is better suited for that. The idea really is to lose information from the leaf index pages, in favor of a drastically smaller index. On a completely clustered table, the heap effectively is the leaf level of the index. I'm glad we're having this conversation now. I'd really appreciate review of the design. I've been posting updates every now and then, asking for comments, but never got any. If you have suggestions, I'm all ears and I still have some time left before feature freeze to make changes. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Heikki Linnakangas <heikki@enterprisedb.com> writes: >>> With the unapplied GIT patch, the index doesn't store the index key of >>> every tuple. >> >> I thought the design was to eliminate *duplicate* keys from the index. >> Not to lose data. > The idea really is to lose information from the leaf index pages, in > favor of a drastically smaller index. On a completely clustered table, > the heap effectively is the leaf level of the index. I'm really dubious that this is an intelligent way to go. In the first place, how will you keep the index sorted if you can't determine the values of all the keys? It certainly seems that this would break the ability to have a simple indexscan return sorted data, even if the index itself doesn't get corrupted. In the second place, this seems to forever kill the idea of indexscans that don't visit the heap --- not that we have any near-term prospect of doing that, but I know a lot of people remain interested in the idea. The reason this catches me by surprise is that you've said several times that you intended GIT to be something that could just be enabled universally. If it's lossy then there's a much larger argument that not everyone would want it. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Tom Lane wrote: > I'm really dubious that this is an intelligent way to go. In the first > place, how will you keep the index sorted if you can't determine the > values of all the keys? It certainly seems that this would break the > ability to have a simple indexscan return sorted data, even if the index > itself doesn't get corrupted. That's indeed a very fundamental thing with the current design. The index doesn't retain the complete order within heap pages. That information is lost, again in favor of a smaller index size. It incurs a significant CPU overhead, but on an I/O bound system that's a tradeoff you want to make. At the moment, I'm storing the offsets within the heap in a bitmap attached to the index tuple. btgettuple fetches all the heap tuples represented by the grouped index tuple, checks their visibility, sorts them into index order, and returns them to the caller one at a time. Thats ugly, API-wise, because it makes the indexam to actually go look at the heap, which it shouldn't have to deal with. Another approach I've been thinking of is to store a list of offsets, in index order. That would avoid the problem of returning sorted data, and reduce the CPU overhead incurred by sorting and scanning, at the cost of much larger (but still much smaller than what we have now) index. > In the second place, this seems to > forever kill the idea of indexscans that don't visit the heap --- not > that we have any near-term prospect of doing that, but I know a lot of > people remain interested in the idea. I'm certainly interested in that. It's not really needed for clustered indexes, though. A well-clustered index is roughly one level shallower, and the heap effectively is the leaf-level, therefore the amount of I/O you need to fetch the index tuple + heap tuple, is roughly the same that as fetching just the index tuple from a normal b-tree index. On non-clustered indexes, index-only scans would of course still be useful. > The reason this catches me by surprise is that you've said several times > that you intended GIT to be something that could just be enabled > universally. If it's lossy then there's a much larger argument that not > everyone would want it. Yeah, we can't just always enable it by default. While a clustered index would degrade to a normal b-tree when the heap isn't clustered, you would still not want to always enable the index clustering because of the extra CPU overhead. That has become clear in the CPU bound tests I've run. I think we could still come up with some safe condiitions when we could enable it by default, though. In particular, I've been thinking that if you run CLUSTER on a table, you'd definitely want to use a clustered index as well. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Heikki Linnakangas <heikki@enterprisedb.com> writes: > Tom Lane wrote: >> In the second place, this seems to >> forever kill the idea of indexscans that don't visit the heap --- not >> that we have any near-term prospect of doing that, but I know a lot of >> people remain interested in the idea. > I'm certainly interested in that. It's not really needed for clustered > indexes, though. A well-clustered index is roughly one level shallower, > and the heap effectively is the leaf-level, therefore the amount of I/O > you need to fetch the index tuple + heap tuple, is roughly the same that > as fetching just the index tuple from a normal b-tree index. That argument ignores the fact that the heap entries are likely to be much wider than the index entries, due to having other columns in them. > I think we could still come up with some safe condiitions when we could > enable it by default, though. At this point I'm feeling unconvinced that we want it at all. It's sounding like a large increase in complexity (both implementation-wise and in terms of API ugliness) for a fairly narrow use-case --- just how much territory is going to be left for this between HOT and bitmap indexes? I particularly dislike the idea of having the index AM reaching directly into the heap --- we should be trying to get rid of that, not add more cases. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: >> Tom Lane wrote: >>> In the second place, this seems to >>> forever kill the idea of indexscans that don't visit the heap --- not >>> that we have any near-term prospect of doing that, but I know a lot of >>> people remain interested in the idea. > >> I'm certainly interested in that. It's not really needed for clustered >> indexes, though. A well-clustered index is roughly one level shallower, >> and the heap effectively is the leaf-level, therefore the amount of I/O >> you need to fetch the index tuple + heap tuple, is roughly the same that >> as fetching just the index tuple from a normal b-tree index. > > That argument ignores the fact that the heap entries are likely to be > much wider than the index entries, due to having other columns in them. True, that's the "roughly" part. It does indeed depend on your schema. As a data point, here's the index sizes (in pages) of a 140 warehouse TPC-C database: index name normal grouped % of normal size -------------------------------------- i_customer 31984 29250 91.5% i_orders 11519 11386 98.8% pk_customer 11519 1346 11.6% pk_district 6 2 pk_item 276 10 3.6% pk_new_order 3458 42 1.2% pk_order_line 153632 2993 1.9% pk_orders 11519 191 1.7% pk_stock 38389 2815 7.3% pk_warehouse 8 2 The customer table is an example of pretty wide table, there's only ~12 tuples per page. pk_customer is still benefiting a lot. i_customer and i_orders are not benefiting because the tables are not in the index order. The orders-related indexes are seeing the most benefit, they don't have many columns. >> I think we could still come up with some safe condiitions when we could >> enable it by default, though. > > At this point I'm feeling unconvinced that we want it at all. It's > sounding like a large increase in complexity (both implementation-wise > and in terms of API ugliness) for a fairly narrow use-case --- just how > much territory is going to be left for this between HOT and bitmap indexes? I don't see how HOT is overlapping with clustered indexes. On the contrary, it makes clustered indexes work better, because it reduces the amount of index inserts needed and helps to keep a table clustered. The use cases for bitmap indexes and clustered indexes do overlap somewhat. But clustered indexes have an edge because: - there's no requirement of having only a small number of distinct values - they support uniqueness checks - you can efficiently have a mixture of grouped and non-grouped tuples, if your table is only partly clustered In general, clustered indexes are more suited for OLTP work than bitmap indexes. > I particularly dislike the idea of having the index AM reaching directly > into the heap --- we should be trying to get rid of that, not add more > cases. I agree. The right way would be to add support for partial ordering and candidate matches to the indexam API, and move all the sorting etc. ugliness out of the indexam. That's been on my TODO since the beginning. If you're still not convinced that we want this at all, how would you feel about the another approach I described? The one where the in-heap-page order is stored in the index tuples, so there's no need for sorting, at the cost of losing part of the I/O benefit. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Heikki Linnakangas wrote: > Tom Lane wrote: >> Heikki Linnakangas <heikki@enterprisedb.com> writes: >>> Tom Lane wrote: >>>> In the second place, this seems to >>>> forever kill the idea of indexscans that don't visit the heap --- not >>>> that we have any near-term prospect of doing that, but I know a lot of >>>> people remain interested in the idea. >> >>> I'm certainly interested in that. It's not really needed for >>> clustered indexes, though. A well-clustered index is roughly one >>> level shallower, and the heap effectively is the leaf-level, >>> therefore the amount of I/O you need to fetch the index tuple + heap >>> tuple, is roughly the same that as fetching just the index tuple from >>> a normal b-tree index. >> >> That argument ignores the fact that the heap entries are likely to be >> much wider than the index entries, due to having other columns in them. > > True, that's the "roughly" part. It does indeed depend on your schema. > As a data point, here's the index sizes (in pages) of a 140 warehouse > TPC-C database: Ah, I see now that you didn't (necessarily) mean that the clustering becomes inefficient at reducing the index size on wider tables, but that there's much more heap pages than leaf pages in a normal index. That's true, you might not want to use clustered index in that case, to allow index-only scans. If we had that feature, that is. Often, though, when using index-only scans, columns are added to the index to allow them to be returned in an index-only scans. That narrows the gap a bit. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| ||||
| On Mon, 2007-03-12 at 13:56 -0400, Tom Lane wrote: > At this point I'm feeling unconvinced that we want it at all. It's > sounding like a large increase in complexity (both implementation-wise > and in terms of API ugliness) for a fairly narrow use-case --- just > how much territory is going to be left for this between HOT and bitmap > indexes? HOT and clustered indexes have considerable synergy. In many tests we've got +20% performance with them acting together. Neither one achieves this performance on their own, but together they work very well. There is an overlap between clustered and bitmap indexes, but they come at the problem from different ends of the scale. Bitmap indexes are designed to cope well with highly non-unique data, while clustered indexes optimise for unique or somewhat unique keys. The difference is really bitmap for DW and clustered indexes for OLTP. The ideas for bitmap indexes come from research and other RDBMS implementations. Clustered indexes have also got external analogs - the concepts are very similar to SQLServer Clustered Indexes and Teradata Primary Indexes (Block Index structure), as well as being reasonably close to Oracle's Index Organised Tables. Clustered indexes offer a way to reduce index size to 1-5% of normal b-tree sizes, yet still maintaining uniqueness checking capability. For VLDB, that is a win for either OLTP or DW - think about a 1 TB index coming down to 10-50 GB in size. The benefit is significant for most tables over a ~1 GB in size through I/O reduction on leaf pages. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |