Unix Technical Forum

Re: Hash index todo list item

This is a discussion on Re: Hash index todo list item within the pgsql Hackers forums, part of the PostgreSQL category; --> Tom, That is great. I am looking forward to your patch. After the issues that you needed to address, ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-15-2008, 11:28 PM
Kenneth Marshall
 
Posts: n/a
Default Re: Hash index todo list item

Tom,

That is great. I am looking forward to your patch. After the
issues that you needed to address, I think that it would be
reasonable to add a few more user settings for the hash index.
Fill-factor is too course a knob. The others that I have been
considering are:

maxtuples - Not really the maximum, but a target value to use
for setting up the initial buckets. This would allow you to
set it for data loads and avoid the "split-n-copy" trauma
that you are trying to avoid with your new hash build process.

multiplicity - Try to capture use cases that would require many
overflow pages. In particular, if we discard the normal index
page layout we can skip the space overhead of the page pointer
and generate a more compact index. Then you could use a few
more hash bits to lookup the index entry in the page. How many
bits would be determined by this factor. 8-bits would give
you 256 sub-pieces that could each hold about 3 entries using
the current 4-byte hash, or 2 entries using an 8-byte hash.

What do you think?

Cheers,
Ken

On Wed, Oct 17, 2007 at 03:31:58PM -0700, Tom Raney wrote:
> Kenneth,
>
> Great!
>
> Yes, we did update the code to use the estimate. I will post the patch
> with this update. We only saw a very small difference in index build time,
> but you may when you add many columns to the base relation.
> With 1 billion tuples, you should start to see the hash index outperform
> the btree for some equality probes, I would imagine. With a 90% fill
> factor, the btree would require 4 levels to index that many tuples. If the
> top two were in memory, there would be 3 IOs needed. I don't think PG
> supports index only scans, so it will take the extra IO to probe the base
> relation. The hash may take up to 2 IOs and maybe even less (or maybe more
> depending on how many overflow buckets there are). It might be interesting
> to fiddle with the fill factors of each index - hash pages (buckets)
> default to 75% full.
> -Tom
>> Tom,
>>
>> I am getting ready to stitch in an updated, simplified version
>> of Neil Conway's hash-only hash index patch. Did you have a
>> chance to update your sizing function to use the planner-like
>> estimate and not a full table scan? I would like to be able
>> to use that when my test table start to have 10^9 entries.
>> If you have not had a chance, I will try and add it myself.
>>
>> Regards,
>> Ken
>>
>>

>
>


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-15-2008, 11:28 PM
Tom Raney
 
Posts: n/a
Default Re: Hash index todo list item

Kenneth, I just pushed the revised patch (v2!). The revised approach
samples the parent relation to estimate the number of tuples rather than
performing a complete scan. In my tests, the estimate appears to be
accurate, erring on the larger side, which is fine.

> Tom,
>
> That is great. I am looking forward to your patch. After the
> issues that you needed to address, I think that it would be
> reasonable to add a few more user settings for the hash index.
> Fill-factor is too course a knob. The others that I have been
> considering are:
>


> maxtuples - Not really the maximum, but a target value to use
> for setting up the initial buckets. This would allow you to
> set it for data loads and avoid the "split-n-copy" trauma
> that you are trying to avoid with your new hash build process.
>

If I understand you correctly, I believe we already do this with our
current build process, there should not be any splits of the index if we
estimated the tuple count correctly. However, what gets you is
collisions where lots of overflow pages occur when distinct keys map to
the same bucket, or if you have lots of duplicate keys. Because your
overall tuple count doesn't exceed the fill factor, no splits occur, but
lengthy bucket chains lead to lots of IOs. You touch on this below.
> multiplicity - Try to capture use cases that would require many
> overflow pages. In particular, if we discard the normal index
> page layout we can skip the space overhead of the page pointer
> and generate a more compact index. Then you could use a few
> more hash bits to lookup the index entry in the page. How many
> bits would be determined by this factor. 8-bits would give
> you 256 sub-pieces that could each hold about 3 entries using
> the current 4-byte hash, or 2 entries using an 8-byte hash.
>
> What do you think?
>

Yes, this is a good direction. If we can increase the number of buckets
and reduce the bucket size (either physically or virtually) to allow
more direct access without creating a huge index on disk, that would be
ideal. But, then if you do have collisions, overflows occur more
frequently. I spoke with Neil Conway yesterday at the PG conference
here in Portland and he piqued my interest in examining his hash code
more closely to see what he has already done in this area.

-Tom
> Cheers,
> Ken
>
> On Wed, Oct 17, 2007 at 03:31:58PM -0700, Tom Raney wrote:
>
>> Kenneth,
>>
>> Great!
>>
>> Yes, we did update the code to use the estimate. I will post the patch
>> with this update. We only saw a very small difference in index build time,
>> but you may when you add many columns to the base relation.
>> With 1 billion tuples, you should start to see the hash index outperform
>> the btree for some equality probes, I would imagine. With a 90% fill
>> factor, the btree would require 4 levels to index that many tuples. If the
>> top two were in memory, there would be 3 IOs needed. I don't think PG
>> supports index only scans, so it will take the extra IO to probe the base
>> relation. The hash may take up to 2 IOs and maybe even less (or maybe more
>> depending on how many overflow buckets there are). It might be interesting
>> to fiddle with the fill factors of each index - hash pages (buckets)
>> default to 75% full.
>> -Tom
>>
>>> Tom,
>>>
>>> I am getting ready to stitch in an updated, simplified version
>>> of Neil Conway's hash-only hash index patch. Did you have a
>>> chance to update your sizing function to use the planner-like
>>> estimate and not a full table scan? I would like to be able
>>> to use that when my test table start to have 10^9 entries.
>>> If you have not had a chance, I will try and add it myself.
>>>
>>> Regards,
>>> Ken
>>>
>>>
>>>

>>

>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>
>



---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-15-2008, 11:28 PM
Kenneth Marshall
 
Posts: n/a
Default Re: Hash index todo list item

Tom,

Thank you for the update. I am currently working on updating the
patch Neil Conway sent in against 8.0-ish that stores only the hash
in the index and locates the entries within the page using a binary
search. Then I will fold in your recent update.

On Sun, Oct 21, 2007 at 01:13:48PM -0700, Tom Raney wrote:
> Kenneth, I just pushed the revised patch (v2!). The revised approach
> samples the parent relation to estimate the number of tuples rather than
> performing a complete scan. In my tests, the estimate appears to be
> accurate, erring on the larger side, which is fine.
>


Yes, larger is great and what we need to avoid all the index tuple
suffling between pages.

>> Tom,
>>
>> That is great. I am looking forward to your patch. After the
>> issues that you needed to address, I think that it would be
>> reasonable to add a few more user settings for the hash index.
>> Fill-factor is too course a knob. The others that I have been
>> considering are:
>>

>
>> maxtuples - Not really the maximum, but a target value to use
>> for setting up the initial buckets. This would allow you to
>> set it for data loads and avoid the "split-n-copy" trauma
>> that you are trying to avoid with your new hash build process.
>>

> If I understand you correctly, I believe we already do this with our
> current build process, there should not be any splits of the index if we
> estimated the tuple count correctly. However, what gets you is collisions
> where lots of overflow pages occur when distinct keys map to the same
> bucket, or if you have lots of duplicate keys. Because your overall tuple
> count doesn't exceed the fill factor, no splits occur, but lengthy bucket
> chains lead to lots of IOs. You touch on this below.


Yes, you do address this in your patches and it works well for an
existing heap. My idea was to minimize the shuffling problem when we
are doing a data load and do not have a heap to get a count from
because it has not been loaded yet.
>> multiplicity - Try to capture use cases that would require many
>> overflow pages. In particular, if we discard the normal index
>> page layout we can skip the space overhead of the page pointer
>> and generate a more compact index. Then you could use a few
>> more hash bits to lookup the index entry in the page. How many
>> bits would be determined by this factor. 8-bits would give
>> you 256 sub-pieces that could each hold about 3 entries using
>> the current 4-byte hash, or 2 entries using an 8-byte hash.
>>
>> What do you think?
>>

> Yes, this is a good direction. If we can increase the number of buckets
> and reduce the bucket size (either physically or virtually) to allow more
> direct access without creating a huge index on disk, that would be ideal.
> But, then if you do have collisions, overflows occur more frequently. I
> spoke with Neil Conway yesterday at the PG conference here in Portland and
> he piqued my interest in examining his hash code more closely to see what
> he has already done in this area.


Right, overflows would occur more frequently and any overflow would
allocate a full page. It may be possible to estimate the multiplicity
and minimize the use of overflow pages. If we know that on average that
there are no more than 10 items in a bucket, we can size the virtual
buckets on the first page to support 10 items and minimize the rollover
to an overflow page.

Other ideas, once we hit the overflow page, back-off to the current
fullpage use to maximize the fill-factor, possibly using Neil's
binary search to speed lookups within the overflow pages. Also, with
the smaller virtual buckets use the remaining space on the page as
the first overflow page to hopefully prevent needing to allocate a
full new overflow page. This would be most useful for the smaller
virtual bucket sizes and improve the overall packing efficiency of
the index. If we intend to take advantage of the O(1) behavior, it
will be most useful for large numbers of tuples, more than 10^9. A
32-bit hash is not good enough and we will need to use a 64-bit hash
to reduce collisions in the hash table and consequent need for overflow
pages. I already have the hash function updated to support 64-bit and
will look at getting that functional once I have the hash-only version
working with the 32-bit hashes. Also, without the need to store the
value in the index, we can boost the size of indexable fields to the
page size, and larger. I was tentatively targeting 32k as the implementation
max, because then hashing time becomes the issue. Of course, all of these
changes and options will need to be benchmarked and discussed.

Thank you again for posting the updated patch.

Regards,
Ken

>
> -Tom
>> Cheers,
>> Ken
>>
>> On Wed, Oct 17, 2007 at 03:31:58PM -0700, Tom Raney wrote:
>>
>>> Kenneth,
>>>
>>> Great!
>>>
>>> Yes, we did update the code to use the estimate. I will post the patch
>>> with this update. We only saw a very small difference in index build
>>> time, but you may when you add many columns to the base relation. With 1
>>> billion tuples, you should start to see the hash index outperform the
>>> btree for some equality probes, I would imagine. With a 90% fill factor,
>>> the btree would require 4 levels to index that many tuples. If the top
>>> two were in memory, there would be 3 IOs needed. I don't think PG
>>> supports index only scans, so it will take the extra IO to probe the base
>>> relation. The hash may take up to 2 IOs and maybe even less (or maybe
>>> more depending on how many overflow buckets there are). It might be
>>> interesting to fiddle with the fill factors of each index - hash pages
>>> (buckets) default to 75% full. -Tom
>>>
>>>> Tom,
>>>>
>>>> I am getting ready to stitch in an updated, simplified version
>>>> of Neil Conway's hash-only hash index patch. Did you have a
>>>> chance to update your sizing function to use the planner-like
>>>> estimate and not a full table scan? I would like to be able
>>>> to use that when my test table start to have 10^9 entries.
>>>> If you have not had a chance, I will try and add it myself.
>>>>
>>>> Regards,
>>>> Ken
>>>>
>>>>
>>>

>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 2: Don't 'kill -9' the postmaster
>>
>>

>
>


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:55 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com