This is a discussion on Re: On-disk bitmap index patch within the pgsql Hackers forums, part of the PostgreSQL category; --> I think we do know, have you reviewed the results in the briefing? - Luke Sent from my GoodLink ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I think we do know, have you reviewed the results in the briefing? - Luke Sent from my GoodLink synchronized handheld (www.good.com) -----Original Message----- From: mark@mark.mielke.cc [mailto:mark@mark.mielke.cc] Sent: Tuesday, July 25, 2006 01:09 AM Eastern Standard Time To: Tom Lane Cc: Bruce Momjian; Jie Zhang; Hannu Krosing; Gavin Sherry; pgsql-hackers@postgresql.org; Luke Lonergan Subject: Re: [HACKERS] On-disk bitmap index patch On Tue, Jul 25, 2006 at 12:36:42AM -0400, Tom Lane wrote: > mark@mark.mielke.cc writes: > > Reading 1/4, for a larger table, has a good chance of being faster than > > reading 4/4 of the table. :-) > Really? > > If you have to hit one tuple out of four, it's pretty much guaranteed > that you will need to fetch every heap page. So using an index provides > zero I/O savings on the heap side, and any fetches needed to read the > index are pure cost. Now you have to demonstrate that the CPU costs > involved in processing the index are significantly cheaper than the cost > of just testing the WHERE qual at every heap tuple --- not a bet that's > likely to win at a one-in-four ratio. Haha. Of course - but that's assuming uniform spread of the values. Next I would try clustering the table on the bitmap index... :-) My databases aren't as large as many of yours. Most or all of them will fit in 1 Gbytes of RAM. The I/O cost isn't substantial for these, but the WHERE clause might be. But yeah - we don't know. Waste of code or performance boost. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ .. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| "Luke Lonergan" <LLonergan@greenplum.com> writes: > I think we do know, have you reviewed the results in the briefing? I find those results moderately unconvincing, primarily because they are based on choosing the least efficient possible data representation (viz char(n)), and thus the btree indexes suffer severe and quite artificial bloat. A database schema chosen with even minimal attention to PG-specific tuning would probably have btree indexes half the size. That wouldn't completely eliminate the performance differential shown, but it would bring it down into the ballpark where you have to question whether it makes sense to support another index AM. The reason I have such high sales resistance is that we've carried the hash and rtree AMs for years, hoping that someone would do the work to make them actually worth using, with little result. I don't want any more second-class-citizen index AMs, and that's why I'm questioning what the scope of applicability of bitmap indexes really is. They need to be popular enough that people will be motivated to work on them, or they shouldn't be in core. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Ühel kenal päeval, T, 2006-07-25 kell 13:06, kirjutas Tom Lane: > "Luke Lonergan" <LLonergan@greenplum.com> writes: > > I think we do know, have you reviewed the results in the briefing? > > I find those results moderately unconvincing, primarily because they > are based on choosing the least efficient possible data representation > (viz char(n)), and thus the btree indexes suffer severe and quite > artificial bloat. hmm, maybe this should be fixed in btree then ? do we really need to store padding blanks in btree ? > A database schema chosen with even minimal attention > to PG-specific tuning would probably have btree indexes half the size. > That wouldn't completely eliminate the performance differential shown, > but it would bring it down into the ballpark where you have to question > whether it makes sense to support another index AM. It still depends on your data volumes. if you spend lots and lots of $ on hardware just to store surplus index bloat, it may be worth it. Remember, that the bizgres folks develop these things for real-world datawarehousing, where saving a few (tens or hundreds of) terabytes of storage and corresponging amount of RAM is a real win. > The reason I have such high sales resistance is that we've carried the > hash and rtree AMs for years, hoping that someone would do the work to > make them actually worth using, with little result. What would be the use-case for hash indexes ? And what should be done to make them faster than btree ? I know that they are not wal-logged, but this is beside the point for DWH apps. and was'nt the rtree index recently deprecated in favour of GIST implementation of the same ? > I don't want any > more second-class-citizen index AMs, and that's why I'm questioning > what the scope of applicability of bitmap indexes really is. They need > to be popular enough that people will be motivated to work on them, > or they shouldn't be in core. Is there an easy way to put them into contrib/ for some test period so that they can become popular among mainstream postgresql users ? -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Hannu Krosing <hannu@skype.net> writes: > Ühel kenal päeval, T, 2006-07-25 kell 13:06, kirjutas Tom Lane: >> The reason I have such high sales resistance is that we've carried the >> hash and rtree AMs for years, hoping that someone would do the work to >> make them actually worth using, with little result. > What would be the use-case for hash indexes ? And what should be done to > make them faster than btree ? If we knew, we'd do it ;-) But no one's put enough effort into it to find out. > and was'nt the rtree index recently deprecated in favour of GIST > implementation of the same ? Yeah, we finally gave up on rtree entirely. I don't want to see any other index AMs languishing in the closet like that. If bitmap can hold its own to the extent that people are interested in working on it/improving it, then great, but I'm worried that it's not going to have a wide enough use-case to attract maintainers. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Tom, On 7/25/06 1:31 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Yeah, we finally gave up on rtree entirely. I don't want to see any > other index AMs languishing in the closet like that. If bitmap can > hold its own to the extent that people are interested in working on > it/improving it, then great, but I'm worried that it's not going to > have a wide enough use-case to attract maintainers. How do we close the gap? I think Jie is interested in maintaining it, and we're looking to extend the range of applications for both the AM and extensions that use the raw bitmap comparators made available to the executor. This should be just the start of some really great work on speedy access using bitmaps. Even as it sits, the on-disk bitmap is over 100x faster in cases where it's suited and the other commercial DBMS have had this popular feature for years. - Luke ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| > I find those results moderately unconvincing, primarily because they > are based on choosing the least efficient possible data representation > (viz char(n)), and thus the btree indexes suffer severe and quite > artificial bloat. A database schema chosen with even minimal attention The specific data was chosen in presentation, because it comes from TPC-H definition (data that everybody understands) and they were the only columns where bitmap index will be more beneficial (i.e. the columns were low cardinality ones). > to PG-specific tuning would probably have btree indexes half the size. > That wouldn't completely eliminate the performance differential shown, > but it would bring it down into the ballpark where you have to question > whether it makes sense to support another index AM. Comparison of index creation time and index size for similar cardinalities for *integer* and *numeric* data-types for b-tree and bitmap index. Benefits are seen in all the cases, index creation time, space taken by index as well as querying: Total rows in the table: 2 million Size of table in kbytes: 431976 Table definition and column cardinalities: Column | Type | Cardinality --------+-------------------+----------- c1 | integer | i10 | integer | 10 i50 | integer | 50 n10 | numeric | 10 n50 | numeric | 50 ctext | character varying | Note: Time in seconds and size in kbytes (as seen from select pg_relation_size()): ------------------------------------------------------------------------- Column B-tree size Bitmap size B-tree time Bitmap time ------------------------------------------------------------------------- i10 35056 2392 11.8 6.0 i50 35056 4328 10.8 6.4 n10 52264 2656 34.8 9.6 n50 52616 4272 34.3 11.7 ------------------------------------------------------------------------- Query timings (in seconds): ------------------------------------------------------------------------- Query B-tree Bitmap ------------------------------------------------------------------------- select count(*) from btree_test where 4.08 1.61 i10=5 and i50=2 and n10=0.1 and n50=0.05; ------------------------------------------------------------------------- This test case fits in memory, the benefits seen will be more if we take bigger test case. I will like to re-iterate the benefits of bitmap index: 1. Size and time to create bitmap index is less than b-tree index for low cardinality column of any data-type. 2. The gain in query performance can be attributed to ŒBitmap And¹ and ŒBitmap Or¹ operations being more efficient for bitmap indexes as compared to b-trees. Note that both bitmap and b-tree indexes use the bitmap index scan access method, however the behavior is different for each. With a b-tree index, the b-tree indexes are scanned to create a temporary in-memory bitmap index. With the Bizgres on-disk bitmap index, the bitmap scan retrieves several on-disk bitmap pages at once and provides them to the ŒBitmap And¹ and ŒBitmap Or¹ operators. The smaller size of the bitmap indexes combined with the efficiency of the And and Or operators are the reasons for the increase in query speed. Ayush ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On Jul 25, 2006, at 3:31 PM, Tom Lane wrote: > Hannu Krosing <hannu@skype.net> writes: >> Ühel kenal päeval, T, 2006-07-25 kell 13:06, kirjutas Tom Lane: >>> The reason I have such high sales resistance is that we've >>> carried the >>> hash and rtree AMs for years, hoping that someone would do the >>> work to >>> make them actually worth using, with little result. > >> What would be the use-case for hash indexes ? And what should be >> done to >> make them faster than btree ? > > If we knew, we'd do it ;-) But no one's put enough effort into it > to find out. Do they use the same hash algorithm as hash joins/aggregation? If so, wouldn't hash indexes be faster for those operations than regular indexes? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Jim Nasby wrote: > On Jul 25, 2006, at 3:31 PM, Tom Lane wrote: > >Hannu Krosing <hannu@skype.net> writes: > >>What would be the use-case for hash indexes ? And what should be > >>done to make them faster than btree ? > > > >If we knew, we'd do it ;-) But no one's put enough effort into it > >to find out. > > Do they use the same hash algorithm as hash joins/aggregation? If so, > wouldn't hash indexes be faster for those operations than regular > indexes? The main problem doesn't seem to be in the hash algorithm (which I understand to mean the hashing function), but in the protocol for concurrent access of index pages, and the distribution of keys in pages of a single hash key. This is described in a README file or a code comment somewhere in the hash AM code. Someone needs to do some profiling to find out what the bottleneck really is, and ideally find a way to fix it. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| On Thu, Jul 27, 2006 at 01:46:01PM -0400, Alvaro Herrera wrote: > Jim Nasby wrote: > > On Jul 25, 2006, at 3:31 PM, Tom Lane wrote: > > >Hannu Krosing <hannu@skype.net> writes: > > > >>What would be the use-case for hash indexes ? And what should be > > >>done to make them faster than btree ? > > > > > >If we knew, we'd do it ;-) But no one's put enough effort into it > > >to find out. > > > > Do they use the same hash algorithm as hash joins/aggregation? If so, > > wouldn't hash indexes be faster for those operations than regular > > indexes? > > The main problem doesn't seem to be in the hash algorithm (which I > understand to mean the hashing function), but in the protocol for > concurrent access of index pages, and the distribution of keys in pages > of a single hash key. > > This is described in a README file or a code comment somewhere in the > hash AM code. Someone needs to do some profiling to find out what the > bottleneck really is, and ideally find a way to fix it. What I'm getting at is that I've never seen any explanation for the theoretical use cases where a hash index would outperform a btree. If we knew what kind of problems hash indexes were supposed to solve, we could try and interest people who are solving those kinds of problems in fixing hash indexes. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| ||||
| Jim C. Nasby wrote: > What I'm getting at is that I've never seen any explanation for the > theoretical use cases where a hash index would outperform a btree. If we > knew what kind of problems hash indexes were supposed to solve, we could > try and interest people who are solving those kinds of problems in > fixing hash indexes. The btree index needs to descend potentially many pages before getting to the leaf page, where the actual index is stored. The hash index can get at the "leaf" node in --supposedly-- one fetch. Btree is O(logN) to get a single key, while hash is O(1). Our problem lies in the constants; for btree they are smaller than for hash, so in practice that O(logN) is always smaller than O(1). I've heard other database systems manage to have hash indexes that are actually faster than btree, so either (1) our btree absolutely rocks, or (2) their hash implementations are better (probably both). -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| Thread Tools | |
| Display Modes | |
|
|