Unix Technical Forum

Re: [PATCHES] WIP: bitmap indexes

This is a discussion on Re: [PATCHES] WIP: bitmap indexes within the pgsql Hackers forums, part of the PostgreSQL category; --> Gavin Sherry <swm@linuxworld.com.au> writes: > Attached is an update to the patch implementing bitmap indexes Jie sent > last ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 05:05 AM
Tom Lane
 
Posts: n/a
Default Re: [PATCHES] WIP: bitmap indexes

Gavin Sherry <swm@linuxworld.com.au> writes:
> Attached is an update to the patch implementing bitmap indexes Jie sent
> last week.


What's the current status of this patch ... has any work been done since
the first of the month?

I suppose the patch as given here no longer applies to HEAD, because of
recent changes in rmgr for Simon's restartable-recovery patch (should be
easy enough to fix) and consumption of OIDs by other patches (also easy
to fix, in principle, but doubtless tedious).

If you have to renumber OIDs I'd suggest starting at 3000 for your added
OIDs, to leave some daylight for any other patches that go in during the
next couple weeks.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-12-2008, 05:05 AM
Gavin Sherry
 
Posts: n/a
Default Re: [PATCHES] WIP: bitmap indexes

On Mon, 14 Aug 2006, Tom Lane wrote:

> Gavin Sherry <swm@linuxworld.com.au> writes:
> > Attached is an update to the patch implementing bitmap indexes Jie sent
> > last week.

>
> What's the current status of this patch ... has any work been done since
> the first of the month?


Yes. I am tidying up the executor code at the moment so that the on
disk bitmap support is more or less hidden from the executor itself. and
Jie has been fixing multicolumn support and working on the planner. I've
also brought the code into line with PostgreSQL style, etc.

>
> I suppose the patch as given here no longer applies to HEAD, because of
> recent changes in rmgr for Simon's restartable-recovery patch (should be
> easy enough to fix) and consumption of OIDs by other patches (also easy
> to fix, in principle, but doubtless tedious).


Right.

> If you have to renumber OIDs I'd suggest starting at 3000 for your added
> OIDs, to leave some daylight for any other patches that go in during the
> next couple weeks.


Thanks.

I will post an updated patch in a few days time.

Gavin

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-12-2008, 05:05 AM
Tom Lane
 
Posts: n/a
Default Re: [PATCHES] WIP: bitmap indexes

Gavin Sherry <swm@linuxworld.com.au> writes:
> I will post an updated patch in a few days time.


OK. Do you want me to work on the discussed amgetmulti change, or would
that just be joggling your elbow at the moment?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-12-2008, 05:05 AM
Gavin Sherry
 
Posts: n/a
Default Re: [PATCHES] WIP: bitmap indexes

On Mon, 14 Aug 2006, Tom Lane wrote:

> Gavin Sherry <swm@linuxworld.com.au> writes:
> > I will post an updated patch in a few days time.

>
> OK. Do you want me to work on the discussed amgetmulti change, or would
> that just be joggling your elbow at the moment?


Yes, that would be joggling .

The approach I am taking is more or less the one you proposed a few weeks
ago. Namely, to pass the bitmap object as an argument to amgetmulti and
have the routine fill it in as it sees fit.

There is some trickiness, however.

One of the main reasons for the uglification of the executor in Jie's
original patch was that she wanted to avoid the inefficiency of
translating the on disk bitmap representation to the TID bitmap
representation. If the plan calls for a straight on disk
bitmap read or AND/ORing together a few on-disk bitmaps this is justified.
If we're AND/ORing together an ondisk bitmap read and a TID bitmap scan of
a btree, we're in trouble. In this case, the existing code implements the
current semantics of amgetmulti.

What I am doing is treating the bitmap object as opaque, storing the data
in the format the AM wants, and providing a 'translate to TID bitmap'
routine (trivial for btree).

Do you have any thoughts on this?

Thanks,

Gavin

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-12-2008, 05:05 AM
Tom Lane
 
Posts: n/a
Default Re: [PATCHES] WIP: bitmap indexes

Gavin Sherry <swm@linuxworld.com.au> writes:
> One of the main reasons for the uglification of the executor in Jie's
> original patch was that she wanted to avoid the inefficiency of
> translating the on disk bitmap representation to the TID bitmap
> representation.


Offhand that seems like micro-optimization, at least as stated -- you
want to think about number of page fetches rather than exactly how a
bitmap is represented. I've still not got round to reading the patch
in detail, but after thinking about it in the shower on a few mornings,
it seems to me that the key concept we'd like to implement is that a
bitmap index can provide its data in a page-by-page fashion without the
overhead of fabricating the bitmap in-memory as btree forces us to do.
That is, the current implementation involves doing the whole index scan
and building a bitmap in memory, which we then read out page-wise for
the heap scan. The weak spot of this approach is that the in-memory
bitmap can get unreasonably large. With a bitmap index, you can pass
data back in a page-by-page fashion: here are the TID indexes to hit on
this page, then here are the TID indexes to hit on the next page, etc.
Correct me if I'm wrong, but isn't the patch's present hacking on the
executor intended to make it happen like this?

The main architectural idea I have at the moment is that this should all
be private between tidbitmap.c and the bitmap index AM. I think we
could perhaps have a flavor of tidbitmap that is "serial access" as
opposed to the present random-access, ie, instead of keeping the whole
bitmap in memory, you have a function pointer that you can call to
demand the next bitmap page in sequence. tidbitmap.c will need to
manage both kinds of bitmaps and be prepared to do ANDs and ORs across
both types, but just at the moment I see no showstopper reason why this
can't work.

Or is that exactly what you said already? It's late here and I'm a
bit tired...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-12-2008, 05:05 AM
Gavin Sherry
 
Posts: n/a
Default Re: [PATCHES] WIP: bitmap indexes

On Mon, 14 Aug 2006, Tom Lane wrote:

> Gavin Sherry <swm@linuxworld.com.au> writes:
> > One of the main reasons for the uglification of the executor in Jie's
> > original patch was that she wanted to avoid the inefficiency of
> > translating the on disk bitmap representation to the TID bitmap
> > representation.

>
> Offhand that seems like micro-optimization, at least as stated -- you
> want to think about number of page fetches rather than exactly how a
> bitmap is represented. I've still not got round to reading the patch
> in detail, but after thinking about it in the shower on a few mornings,
> it seems to me that the key concept we'd like to implement is that a
> bitmap index can provide its data in a page-by-page fashion without the
> overhead of fabricating the bitmap in-memory as btree forces us to do.
> That is, the current implementation involves doing the whole index scan
> and building a bitmap in memory, which we then read out page-wise for
> the heap scan. The weak spot of this approach is that the in-memory
> bitmap can get unreasonably large. With a bitmap index, you can pass
> data back in a page-by-page fashion: here are the TID indexes to hit on
> this page, then here are the TID indexes to hit on the next page, etc.
> Correct me if I'm wrong, but isn't the patch's present hacking on the
> executor intended to make it happen like this?


Not really. It reads ahead on the bitmap index and passes back the bitmap
words. The other executor routines are hacked up to process the data in
this format.

If I understand your idea correctly, we could change this to read, say, a
page of the index at a time, store this internally in the state object we
pass around, and we can then read out of this the TIDs on a given heap
page which match the query. Once we process all the bitmap data, we just
pull more.

>
> The main architectural idea I have at the moment is that this should all
> be private between tidbitmap.c and the bitmap index AM. I think we
> could perhaps have a flavor of tidbitmap that is "serial access" as
> opposed to the present random-access, ie, instead of keeping the whole
> bitmap in memory, you have a function pointer that you can call to
> demand the next bitmap page in sequence. tidbitmap.c will need to
> manage both kinds of bitmaps and be prepared to do ANDs and ORs across
> both types, but just at the moment I see no showstopper reason why this
> can't work.
>
> Or is that exactly what you said already? It's late here and I'm a
> bit tired...


This is better than what I had in mind. It seems to me that part of this
which will be a litle ugly is dealing with "key in (1,2,3,...)"
transformation. Let me think about it...

Thanks,

Gavin

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-12-2008, 05:05 AM
Tom Lane
 
Posts: n/a
Default Re: [PATCHES] WIP: bitmap indexes

Gavin Sherry <swm@linuxworld.com.au> writes:
> On Mon, 14 Aug 2006, Tom Lane wrote:
>> Correct me if I'm wrong, but isn't the patch's present hacking on the
>> executor intended to make it happen like this?


> Not really. It reads ahead on the bitmap index and passes back the bitmap
> words. The other executor routines are hacked up to process the data in
> this format.


Well, as I said, I don't think there's justification for exposing a
bitmap index's internal data formats to the rest of the system like
that: it's not very future-proof and I don't see that it's buying any
significant performance gain. At some point you have to convert to TIDs
anyway, at least in the sense of knowing what page and line number you
are at, so passing the data as an array of TIDs really isn't going to
hurt much. So my advice is to rip out all those changes and go back to
the existing tidbitmap.c readout API. There's nothing wrong with
the TBMIterateResult data structure.

What I do find interesting to think about is whether, strictly within
tidbitmap.c, there could be an alternate kind of bitmap object that
doesn't have to materialize the whole bitmap for an indexscan in memory
because it knows it can fetch the data on-demand, ie, build the next
page TBMIterateResult data structure on-the-fly from the index when it's
requested. Call it a "stream bitmap" in contrast to the present
"materialized bitmaps". The trick here is to be able to AND and OR a
stream bitmap with another stream bitmap or a materialized bitmap.
I don't see any reason in principle why that couldn't be done: the
output of the AND/OR would be a stream of TBMIterateResults just like
the inputs. That is, it's another stream bitmap, but with a different
generating function and some internal state that includes its source
bitmaps. You'd have to sort a materialized bitmap into order before
starting to AND/OR it with a stream bitmap, but that code is there
already.

You'd probably need to break the abstraction enough for nodeBitmapAnd
and nodeBitmapOr to know which kinds of bitmaps they are dealing with
and do things a little differently in different cases --- in particular,
the optimization nodeBitmapOr understands about how to "OR" things
together probably doesn't work for stream bitmaps. But I see no reason
to be changing nodeBitmapHeapscan.

> ... It seems to me that part of this
> which will be a litle ugly is dealing with "key in (1,2,3,...)"
> transformation. Let me think about it...


You're worrying about the wrong problem if you focus solely on
ScalarArrayOpExpr. The general problem you need to be solving is
"how do I AND and OR these bitmaps efficiently"? Once you've fixed
that, the ScalarArrayOpExpr code is just one user of that behavior.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-12-2008, 05:06 AM
Jie Zhang
 
Posts: n/a
Default Re: [PATCHES] WIP: bitmap indexes



On 8/15/06 6:18 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> Gavin Sherry <swm@linuxworld.com.au> writes:
>> On Mon, 14 Aug 2006, Tom Lane wrote:
>>> Correct me if I'm wrong, but isn't the patch's present hacking on the
>>> executor intended to make it happen like this?

>
>> Not really. It reads ahead on the bitmap index and passes back the bitmap
>> words. The other executor routines are hacked up to process the data in
>> this format.

>
> Well, as I said, I don't think there's justification for exposing a
> bitmap index's internal data formats to the rest of the system like
> that: it's not very future-proof and I don't see that it's buying any
> significant performance gain. At some point you have to convert to TIDs
> anyway, at least in the sense of knowing what page and line number you
> are at, so passing the data as an array of TIDs really isn't going to
> hurt much. So my advice is to rip out all those changes and go back to
> the existing tidbitmap.c readout API. There's nothing wrong with
> the TBMIterateResult data structure.
>


The bitmap words in the bitmap index are very simple and can be very
generic. You can think about them as one bit per tuple along with some
padding bits between heap pages. The problem I have is that I do not know a
good way to construct an in-memory version of this for other index
structures, like b-tree. To be able to handle both cases nicely, you are
right -- TBMIterateResult is better. Or, PagetableEntry may be better since
it will make AND/OR easier.

> What I do find interesting to think about is whether, strictly within
> tidbitmap.c, there could be an alternate kind of bitmap object that
> doesn't have to materialize the whole bitmap for an indexscan in memory
> because it knows it can fetch the data on-demand, ie, build the next
> page TBMIterateResult data structure on-the-fly from the index when it's
> requested. Call it a "stream bitmap" in contrast to the present
> "materialized bitmaps". The trick here is to be able to AND and OR a
> stream bitmap with another stream bitmap or a materialized bitmap.
> I don't see any reason in principle why that couldn't be done: the
> output of the AND/OR would be a stream of TBMIterateResults just like
> the inputs. That is, it's another stream bitmap, but with a different
> generating function and some internal state that includes its source
> bitmaps. You'd have to sort a materialized bitmap into order before
> starting to AND/OR it with a stream bitmap, but that code is there
> already.


I like this idea. I think that we can define a new TBMStatus to be
TBM_STREAM in TIDBitmap. *getmulti functions will remain the same, except
that we add a new returning bool argument, stating if this is a stream
bitmap. If this is a stream bitmap, nodeBitmapIndexScan simply fills spages,
and passes it upstream. When nodeBitmapAnd or nodeBitmapOr ANDs/ORs several
bitmaps, the result bitmap is a stream bitmap if there is at least one
bitmap is a stream bitmap. Then we add another loop in nodeBitmapHeapscan to
be able to pull more data from its subnode.

Thanks,
Jie



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-12-2008, 05:06 AM
Tom Lane
 
Posts: n/a
Default Re: [PATCHES] WIP: bitmap indexes

"Jie Zhang" <jzhang@greenplum.com> writes:
> On 8/15/06 6:18 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>> Well, as I said, I don't think there's justification for exposing a
>> bitmap index's internal data formats to the rest of the system like
>> that: it's not very future-proof and I don't see that it's buying any
>> significant performance gain.


> The bitmap words in the bitmap index are very simple and can be very
> generic.


They're not generic in the least: there's a compression scheme involved
that you might want to whack around at any time. So I disagree with the
idea that it's OK to expose the format outside the access/bitmap/ module.

> I like this idea. I think that we can define a new TBMStatus to be
> TBM_STREAM in TIDBitmap.


It occurs to me that what tbm_begin_iterate really is is a constructor
for a stream bitmap object that reads out the contents of a tbm bitmap
(we need a decent name for the non-stream data structure ... maybe
hash bitmap?). If we think of it like that then we can unify the
ideas some more.

My proposal at this point would be to invent two different Node types,
one for stream bitmaps and one for hash bitmaps. The initial input to
nodeBitmapHeapscan can be either, but if it's given a hash bitmap then
it stream-ifies it for use. amgetmulti can return either kind, and
nodeBitmapAnd and nodeBitmapOr can use IsA tests to decide what to do.
Preserving the existing optimization for ORing hash bitmaps is a bit
tricky but I think it's doable. Consider this API for amgetmulti:

amgetmulti takes an argument which can be either a hash bitmap or NULL.
It returns an object that must be either a hash or stream bitmap.
If it wants to return a stream bitmap, it simply disregards the argument
and returns a constructed stream-bitmap object. If it wants to return
a hash bitmap, then if the argument is not NULL, OR the additional bits
into the argument object and return it; if the argument is NULL,
construct a fresh hash-bitmap object, set bits in it, and return it.

Assume that we have the existing hash-bitmap AND/OR functions as well as
constructors for AND and OR stream bitmaps that take lists of input
stream objects. Then the algorithm for nodeBitmapOr looks like this:

HashBitmap *hashBitmap = NULL;
List *streamBitmaps = NIL;

foreach(input plan)
{
Node *newBitmap = amgetmulti(hashBitmap);

if (IsA(newBitmap, HashBitmap))
{
// any OR-ing required was done implicitly
hashBitmap = newBitmap;
}
else
{
Assert(IsA(newBitmap, StreamBitmap));
streamBitmaps = lappend(streamBitmaps, newBitmap);
}
}

if (streamBitmaps == NIL)
{
// all inputs returned hash, so we're done
return hashBitmap;
}
else
{
// need a stream OR operation atop the inputs
if (hashBitmap)
streamBitmaps = lappend(streamBitmaps,
HashToStreamBitmap(hashBitmap));
return ConstructStreamOr(streamBitmaps);
}

nodeBitmapAnd is a bit different but not any harder.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-12-2008, 05:27 AM
Bruce Momjian
 
Posts: n/a
Default Re: [PATCHES] WIP: bitmap indexes


This has been saved for the 8.3 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------
Jie Zhang wrote:
>
>
> On 8/15/06 6:18 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>
> > Gavin Sherry <swm@linuxworld.com.au> writes:
> >> On Mon, 14 Aug 2006, Tom Lane wrote:
> >>> Correct me if I'm wrong, but isn't the patch's present hacking on the
> >>> executor intended to make it happen like this?

> >
> >> Not really. It reads ahead on the bitmap index and passes back the bitmap
> >> words. The other executor routines are hacked up to process the data in
> >> this format.

> >
> > Well, as I said, I don't think there's justification for exposing a
> > bitmap index's internal data formats to the rest of the system like
> > that: it's not very future-proof and I don't see that it's buying any
> > significant performance gain. At some point you have to convert to TIDs
> > anyway, at least in the sense of knowing what page and line number you
> > are at, so passing the data as an array of TIDs really isn't going to
> > hurt much. So my advice is to rip out all those changes and go back to
> > the existing tidbitmap.c readout API. There's nothing wrong with
> > the TBMIterateResult data structure.
> >

>
> The bitmap words in the bitmap index are very simple and can be very
> generic. You can think about them as one bit per tuple along with some
> padding bits between heap pages. The problem I have is that I do not know a
> good way to construct an in-memory version of this for other index
> structures, like b-tree. To be able to handle both cases nicely, you are
> right -- TBMIterateResult is better. Or, PagetableEntry may be better since
> it will make AND/OR easier.
>
> > What I do find interesting to think about is whether, strictly within
> > tidbitmap.c, there could be an alternate kind of bitmap object that
> > doesn't have to materialize the whole bitmap for an indexscan in memory
> > because it knows it can fetch the data on-demand, ie, build the next
> > page TBMIterateResult data structure on-the-fly from the index when it's
> > requested. Call it a "stream bitmap" in contrast to the present
> > "materialized bitmaps". The trick here is to be able to AND and OR a
> > stream bitmap with another stream bitmap or a materialized bitmap.
> > I don't see any reason in principle why that couldn't be done: the
> > output of the AND/OR would be a stream of TBMIterateResults just like
> > the inputs. That is, it's another stream bitmap, but with a different
> > generating function and some internal state that includes its source
> > bitmaps. You'd have to sort a materialized bitmap into order before
> > starting to AND/OR it with a stream bitmap, but that code is there
> > already.

>
> I like this idea. I think that we can define a new TBMStatus to be
> TBM_STREAM in TIDBitmap. *getmulti functions will remain the same, except
> that we add a new returning bool argument, stating if this is a stream
> bitmap. If this is a stream bitmap, nodeBitmapIndexScan simply fills spages,
> and passes it upstream. When nodeBitmapAnd or nodeBitmapOr ANDs/ORs several
> bitmaps, the result bitmap is a stream bitmap if there is at least one
> bitmap is a stream bitmap. Then we add another loop in nodeBitmapHeapscan to
> be able to pull more data from its subnode.
>
> Thanks,
> Jie
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly


--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 07:59 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com