Unix Technical Forum

Re: On-disk bitmap index patch

This is a discussion on Re: On-disk bitmap index patch within the pgsql Hackers forums, part of the PostgreSQL category; --> I think we do know, have you reviewed the results in the briefing? - Luke Sent from my GoodLink ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 04:39 AM
Luke Lonergan
 
Posts: n/a
Default Re: On-disk bitmap index patch

I think we do know, have you reviewed the results in the briefing?


- Luke

Sent from my GoodLink synchronized handheld (www.good.com)


-----Original Message-----
From: mark@mark.mielke.cc [mailto:mark@mark.mielke.cc]
Sent: Tuesday, July 25, 2006 01:09 AM Eastern Standard Time
To: Tom Lane
Cc: Bruce Momjian; Jie Zhang; Hannu Krosing; Gavin Sherry; pgsql-hackers@postgresql.org; Luke Lonergan
Subject: Re: [HACKERS] On-disk bitmap index patch

On Tue, Jul 25, 2006 at 12:36:42AM -0400, Tom Lane wrote:
> mark@mark.mielke.cc writes:
> > Reading 1/4, for a larger table, has a good chance of being faster than
> > reading 4/4 of the table. :-)

> Really?
>
> If you have to hit one tuple out of four, it's pretty much guaranteed
> that you will need to fetch every heap page. So using an index provides
> zero I/O savings on the heap side, and any fetches needed to read the
> index are pure cost. Now you have to demonstrate that the CPU costs
> involved in processing the index are significantly cheaper than the cost
> of just testing the WHERE qual at every heap tuple --- not a bet that's
> likely to win at a one-in-four ratio.


Haha. Of course - but that's assuming uniform spread of the values.
Next I would try clustering the table on the bitmap index... :-)

My databases aren't as large as many of yours. Most or all of them
will fit in 1 Gbytes of RAM. The I/O cost isn't substantial for these,
but the WHERE clause might be.

But yeah - we don't know. Waste of code or performance boost.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
.. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/




---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-12-2008, 04:41 AM
Tom Lane
 
Posts: n/a
Default Re: On-disk bitmap index patch

"Luke Lonergan" <LLonergan@greenplum.com> writes:
> I think we do know, have you reviewed the results in the briefing?


I find those results moderately unconvincing, primarily because they
are based on choosing the least efficient possible data representation
(viz char(n)), and thus the btree indexes suffer severe and quite
artificial bloat. A database schema chosen with even minimal attention
to PG-specific tuning would probably have btree indexes half the size.
That wouldn't completely eliminate the performance differential shown,
but it would bring it down into the ballpark where you have to question
whether it makes sense to support another index AM.

The reason I have such high sales resistance is that we've carried the
hash and rtree AMs for years, hoping that someone would do the work to
make them actually worth using, with little result. I don't want any
more second-class-citizen index AMs, and that's why I'm questioning
what the scope of applicability of bitmap indexes really is. They need
to be popular enough that people will be motivated to work on them,
or they shouldn't be in core.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-12-2008, 04:41 AM
Hannu Krosing
 
Posts: n/a
Default Re: On-disk bitmap index patch

Ühel kenal päeval, T, 2006-07-25 kell 13:06, kirjutas Tom Lane:
> "Luke Lonergan" <LLonergan@greenplum.com> writes:
> > I think we do know, have you reviewed the results in the briefing?

>
> I find those results moderately unconvincing, primarily because they
> are based on choosing the least efficient possible data representation
> (viz char(n)), and thus the btree indexes suffer severe and quite
> artificial bloat.


hmm, maybe this should be fixed in btree then ?

do we really need to store padding blanks in btree ?

> A database schema chosen with even minimal attention
> to PG-specific tuning would probably have btree indexes half the size.
> That wouldn't completely eliminate the performance differential shown,
> but it would bring it down into the ballpark where you have to question
> whether it makes sense to support another index AM.


It still depends on your data volumes. if you spend lots and lots of $
on hardware just to store surplus index bloat, it may be worth it.

Remember, that the bizgres folks develop these things for real-world
datawarehousing, where saving a few (tens or hundreds of) terabytes of
storage and corresponging amount of RAM is a real win.

> The reason I have such high sales resistance is that we've carried the
> hash and rtree AMs for years, hoping that someone would do the work to
> make them actually worth using, with little result.


What would be the use-case for hash indexes ? And what should be done to
make them faster than btree ? I know that they are not wal-logged, but
this is beside the point for DWH apps.

and was'nt the rtree index recently deprecated in favour of GIST
implementation of the same ?

> I don't want any
> more second-class-citizen index AMs, and that's why I'm questioning
> what the scope of applicability of bitmap indexes really is. They need
> to be popular enough that people will be motivated to work on them,
> or they shouldn't be in core.


Is there an easy way to put them into contrib/ for some test period so
that they can become popular among mainstream postgresql users ?

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-12-2008, 04:41 AM
Tom Lane
 
Posts: n/a
Default Re: On-disk bitmap index patch

Hannu Krosing <hannu@skype.net> writes:
> Ühel kenal päeval, T, 2006-07-25 kell 13:06, kirjutas Tom Lane:
>> The reason I have such high sales resistance is that we've carried the
>> hash and rtree AMs for years, hoping that someone would do the work to
>> make them actually worth using, with little result.


> What would be the use-case for hash indexes ? And what should be done to
> make them faster than btree ?


If we knew, we'd do it ;-) But no one's put enough effort into it
to find out.

> and was'nt the rtree index recently deprecated in favour of GIST
> implementation of the same ?


Yeah, we finally gave up on rtree entirely. I don't want to see any
other index AMs languishing in the closet like that. If bitmap can
hold its own to the extent that people are interested in working on
it/improving it, then great, but I'm worried that it's not going to
have a wide enough use-case to attract maintainers.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-12-2008, 04:41 AM
Luke Lonergan
 
Posts: n/a
Default Re: On-disk bitmap index patch

Tom,

On 7/25/06 1:31 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> Yeah, we finally gave up on rtree entirely. I don't want to see any
> other index AMs languishing in the closet like that. If bitmap can
> hold its own to the extent that people are interested in working on
> it/improving it, then great, but I'm worried that it's not going to
> have a wide enough use-case to attract maintainers.


How do we close the gap?

I think Jie is interested in maintaining it, and we're looking to extend the
range of applications for both the AM and extensions that use the raw bitmap
comparators made available to the executor. This should be just the start
of some really great work on speedy access using bitmaps.

Even as it sits, the on-disk bitmap is over 100x faster in cases where it's
suited and the other commercial DBMS have had this popular feature for
years.

- Luke



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-12-2008, 04:42 AM
Ayush Parashar
 
Posts: n/a
Default Re: On-disk bitmap index patch

> I find those results moderately unconvincing, primarily because they
> are based on choosing the least efficient possible data representation
> (viz char(n)), and thus the btree indexes suffer severe and quite
> artificial bloat. A database schema chosen with even minimal attention

The specific data was chosen in presentation, because it comes from TPC-H
definition (data that everybody understands) and they were the only columns
where bitmap index will be more beneficial (i.e. the columns were low
cardinality ones).

> to PG-specific tuning would probably have btree indexes half the size.
> That wouldn't completely eliminate the performance differential shown,
> but it would bring it down into the ballpark where you have to question
> whether it makes sense to support another index AM.

Comparison of index creation time and index size for similar cardinalities
for *integer* and *numeric* data-types for b-tree and bitmap index. Benefits
are seen in all the cases, index creation time, space taken by index as well
as querying:

Total rows in the table: 2 million
Size of table in kbytes: 431976
Table definition and column cardinalities:
Column | Type | Cardinality
--------+-------------------+-----------
c1 | integer |
i10 | integer | 10
i50 | integer | 50
n10 | numeric | 10
n50 | numeric | 50
ctext | character varying |


Note: Time in seconds and size in kbytes (as seen from select
pg_relation_size()):

-------------------------------------------------------------------------
Column B-tree size Bitmap size B-tree time Bitmap time
-------------------------------------------------------------------------
i10 35056 2392 11.8 6.0
i50 35056 4328 10.8 6.4
n10 52264 2656 34.8 9.6
n50 52616 4272 34.3 11.7
-------------------------------------------------------------------------

Query timings (in seconds):
-------------------------------------------------------------------------
Query B-tree Bitmap
-------------------------------------------------------------------------
select count(*) from btree_test where 4.08 1.61
i10=5 and i50=2 and n10=0.1 and n50=0.05;
-------------------------------------------------------------------------

This test case fits in memory, the benefits seen will be more if we take
bigger test case.

I will like to re-iterate the benefits of bitmap index:
1. Size and time to create bitmap index is less than b-tree index for low
cardinality column of any data-type.
2. The gain in query performance can be attributed to ŒBitmap And¹ and
ŒBitmap Or¹ operations being more efficient for bitmap indexes as compared
to b-trees. Note that both bitmap and b-tree indexes use the bitmap index
scan access method, however the behavior is different for each. With a
b-tree index, the b-tree indexes are scanned to create a temporary in-memory
bitmap index. With the Bizgres on-disk bitmap index, the bitmap scan
retrieves several on-disk bitmap pages at once and provides them to the
ŒBitmap And¹ and ŒBitmap Or¹ operators. The smaller size of the bitmap
indexes combined with the efficiency of the And and Or operators are the
reasons for the increase in query speed.

Ayush



---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-12-2008, 04:43 AM
Jim Nasby
 
Posts: n/a
Default Hash indexes (was: On-disk bitmap index patch)

On Jul 25, 2006, at 3:31 PM, Tom Lane wrote:
> Hannu Krosing <hannu@skype.net> writes:
>> Ühel kenal päeval, T, 2006-07-25 kell 13:06, kirjutas Tom Lane:
>>> The reason I have such high sales resistance is that we've
>>> carried the
>>> hash and rtree AMs for years, hoping that someone would do the
>>> work to
>>> make them actually worth using, with little result.

>
>> What would be the use-case for hash indexes ? And what should be
>> done to
>> make them faster than btree ?

>
> If we knew, we'd do it ;-) But no one's put enough effort into it
> to find out.


Do they use the same hash algorithm as hash joins/aggregation? If so,
wouldn't hash indexes be faster for those operations than regular
indexes?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-12-2008, 04:43 AM
Alvaro Herrera
 
Posts: n/a
Default Re: Hash indexes (was: On-disk bitmap index patch)

Jim Nasby wrote:
> On Jul 25, 2006, at 3:31 PM, Tom Lane wrote:
> >Hannu Krosing <hannu@skype.net> writes:


> >>What would be the use-case for hash indexes ? And what should be
> >>done to make them faster than btree ?

> >
> >If we knew, we'd do it ;-) But no one's put enough effort into it
> >to find out.

>
> Do they use the same hash algorithm as hash joins/aggregation? If so,
> wouldn't hash indexes be faster for those operations than regular
> indexes?


The main problem doesn't seem to be in the hash algorithm (which I
understand to mean the hashing function), but in the protocol for
concurrent access of index pages, and the distribution of keys in pages
of a single hash key.

This is described in a README file or a code comment somewhere in the
hash AM code. Someone needs to do some profiling to find out what the
bottleneck really is, and ideally find a way to fix it.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-12-2008, 04:45 AM
Jim C. Nasby
 
Posts: n/a
Default Re: Hash indexes (was: On-disk bitmap index patch)

On Thu, Jul 27, 2006 at 01:46:01PM -0400, Alvaro Herrera wrote:
> Jim Nasby wrote:
> > On Jul 25, 2006, at 3:31 PM, Tom Lane wrote:
> > >Hannu Krosing <hannu@skype.net> writes:

>
> > >>What would be the use-case for hash indexes ? And what should be
> > >>done to make them faster than btree ?
> > >
> > >If we knew, we'd do it ;-) But no one's put enough effort into it
> > >to find out.

> >
> > Do they use the same hash algorithm as hash joins/aggregation? If so,
> > wouldn't hash indexes be faster for those operations than regular
> > indexes?

>
> The main problem doesn't seem to be in the hash algorithm (which I
> understand to mean the hashing function), but in the protocol for
> concurrent access of index pages, and the distribution of keys in pages
> of a single hash key.
>
> This is described in a README file or a code comment somewhere in the
> hash AM code. Someone needs to do some profiling to find out what the
> bottleneck really is, and ideally find a way to fix it.


What I'm getting at is that I've never seen any explanation for the
theoretical use cases where a hash index would outperform a btree. If we
knew what kind of problems hash indexes were supposed to solve, we could
try and interest people who are solving those kinds of problems in
fixing hash indexes.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-12-2008, 04:45 AM
Alvaro Herrera
 
Posts: n/a
Default Re: Hash indexes (was: On-disk bitmap index patch)

Jim C. Nasby wrote:

> What I'm getting at is that I've never seen any explanation for the
> theoretical use cases where a hash index would outperform a btree. If we
> knew what kind of problems hash indexes were supposed to solve, we could
> try and interest people who are solving those kinds of problems in
> fixing hash indexes.


The btree index needs to descend potentially many pages before getting
to the leaf page, where the actual index is stored. The hash index can
get at the "leaf" node in --supposedly-- one fetch. Btree is O(logN) to
get a single key, while hash is O(1). Our problem lies in the
constants; for btree they are smaller than for hash, so in practice
that O(logN) is always smaller than O(1).

I've heard other database systems manage to have hash indexes that are
actually faster than btree, so either (1) our btree absolutely rocks, or
(2) their hash implementations are better (probably both).

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 01:56 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com