Unix Technical Forum

Re: [PATCH]-hash index improving

This is a discussion on Re: [PATCH]-hash index improving within the pgsql Hackers forums, part of the PostgreSQL category; --> Xiao Meng escribió: > The patch store hash code only in the index tuple. > It based on Neil ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 07-18-2008, 10:50 AM
Alvaro Herrera
 
Posts: n/a
Default Re: [PATCH]-hash index improving

Xiao Meng escribió:
> The patch store hash code only in the index tuple.
> It based on Neil Conway's patch with an old version of PostgreSQL.
> It passes the regression test but I didn't test the performance yet.
> Anyone interested can make a performance test;-)
> You can undefine the macro HASHVALUE_ONLY in hash.h to get the
> original implementation.


I think having the HASHVALUE_ONLY define is not a good idea -- it just
makes the patch harder to read. I suggest just removing the old code
and putting the new code in place. (That's why we have revision
control.)


--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 07-18-2008, 10:50 AM
Jonah H. Harris
 
Posts: n/a
Default Re: [PATCH]-hash index improving

On Thu, Jul 17, 2008 at 12:42 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> I think having the HASHVALUE_ONLY define is not a good idea -- it just
> makes the patch harder to read. I suggest just removing the old code
> and putting the new code in place. (That's why we have revision
> control.)


Agreed. I'm also getting a crash on it with large data sets, so we'll
make sure to get those fixed in the next version of the patch.

-Jonah

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 07-18-2008, 10:50 AM
Kenneth Marshall
 
Posts: n/a
Default Re: [PATCH]-hash index improving

On Thu, Jul 17, 2008 at 12:42:39PM -0400, Alvaro Herrera wrote:
> Xiao Meng escribi?:
> > The patch store hash code only in the index tuple.
> > It based on Neil Conway's patch with an old version of PostgreSQL.
> > It passes the regression test but I didn't test the performance yet.
> > Anyone interested can make a performance test;-)
> > You can undefine the macro HASHVALUE_ONLY in hash.h to get the
> > original implementation.

>
> I think having the HASHVALUE_ONLY define is not a good idea -- it just
> makes the patch harder to read. I suggest just removing the old code
> and putting the new code in place. (That's why we have revision
> control.)
>

One thing it helps is building an old version and a new version
for comparative testing. Otherwise, you could end up with an apples-to-
oranges comparison. I certainly think that the final patch should not
have it, but it is useful now for testing and comparisons.

My two cents,
Ken

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 07-18-2008, 10:50 AM
Kenneth Marshall
 
Posts: n/a
Default Re: [PATCH]-hash index improving

On Thu, Jul 17, 2008 at 02:00:07PM -0400, Jonah H. Harris wrote:
> On Thu, Jul 17, 2008 at 1:54 PM, Kenneth Marshall <ktm@rice.edu> wrote:
> >> I think having the HASHVALUE_ONLY define is not a good idea -- it just
> >> makes the patch harder to read. I suggest just removing the old code
> >> and putting the new code in place. (That's why we have revision
> >> control.)
> >>

> > One thing it helps is building an old version and a new version
> > for comparative testing. Otherwise, you could end up with an apples-to-
> > oranges comparison. I certainly think that the final patch should not
> > have it, but it is useful now for testing and comparisons.

>
> Yes, that's why Xiao did it that way. However, we traditionally just
> submit a patch with only the changes and it's up to the person testing
> to have an identical build-tree without the patch for testing.
> Another reason for it is that even if you build without the define,
> the patch author may have mistakenly added something outside the ifdef
> which could impact testing.
>
> I agree with Alvaro that we should submit it as a standard change patch.


Okay, that makes sense.

Ken

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 07-18-2008, 10:50 AM
Jonah H. Harris
 
Posts: n/a
Default Re: [PATCH]-hash index improving

On Thu, Jul 17, 2008 at 1:54 PM, Kenneth Marshall <ktm@rice.edu> wrote:
>> I think having the HASHVALUE_ONLY define is not a good idea -- it just
>> makes the patch harder to read. I suggest just removing the old code
>> and putting the new code in place. (That's why we have revision
>> control.)
>>

> One thing it helps is building an old version and a new version
> for comparative testing. Otherwise, you could end up with an apples-to-
> oranges comparison. I certainly think that the final patch should not
> have it, but it is useful now for testing and comparisons.


Yes, that's why Xiao did it that way. However, we traditionally just
submit a patch with only the changes and it's up to the person testing
to have an identical build-tree without the patch for testing.
Another reason for it is that even if you build without the define,
the patch author may have mistakenly added something outside the ifdef
which could impact testing.

I agree with Alvaro that we should submit it as a standard change patch.

--
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com
Edison, NJ 08837 | http://www.enterprisedb.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 07-18-2008, 10:50 AM
Alvaro Herrera
 
Posts: n/a
Default Re: [PATCH]-hash index improving

Kenneth Marshall escribió:
> On Thu, Jul 17, 2008 at 12:42:39PM -0400, Alvaro Herrera wrote:


> > I think having the HASHVALUE_ONLY define is not a good idea -- it just
> > makes the patch harder to read. I suggest just removing the old code
> > and putting the new code in place. (That's why we have revision
> > control.)
> >

> One thing it helps is building an old version and a new version
> for comparative testing. Otherwise, you could end up with an apples-to-
> oranges comparison. I certainly think that the final patch should not
> have it, but it is useful now for testing and comparisons.


For this purpose I think it would be easier to have a separate tree with
the patch, and one without it.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 07-18-2008, 10:50 AM
David Fetter
 
Posts: n/a
Default Re: [PATCH]-hash index improving

On Thu, Jul 17, 2008 at 02:11:20PM -0400, Alvaro Herrera wrote:
> Kenneth Marshall escribió:
> > On Thu, Jul 17, 2008 at 12:42:39PM -0400, Alvaro Herrera wrote:

>
> > > I think having the HASHVALUE_ONLY define is not a good idea --
> > > it just makes the patch harder to read. I suggest just removing
> > > the old code and putting the new code in place. (That's why we
> > > have revision control.)
> > >

> > One thing it helps is building an old version and a new version
> > for comparative testing. Otherwise, you could end up with an
> > apples-to- oranges comparison. I certainly think that the final
> > patch should not have it, but it is useful now for testing and
> > comparisons.

>
> For this purpose I think it would be easier to have a separate tree
> with the patch, and one without it.


Here's one tree. Anyone can get an initial copy as follows:

git clone http://git.postgresql.org/git/~davidfetter/hash/.git

Xiao Meng, if you let me know where your git repo is, say by cloning
onto a machine I can see from the internet and applying your patches
to it, I can pull and let others see it

Yes, I know it's a little cumbersome, but we'll get something slicker
as we figure out what "slicker" really should mean.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 07-18-2008, 10:51 AM
Tom Lane
 
Posts: n/a
Default Re: [PATCH]-hash index improving

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Xiao Meng escribió:
>> You can undefine the macro HASHVALUE_ONLY in hash.h to get the
>> original implementation.


> I think having the HASHVALUE_ONLY define is not a good idea -- it just
> makes the patch harder to read.


While we are griping about readability: failure to update the comments
to match the code is NOT, NOT, NOT acceptable. I had barely started
to read the patch before encountering this insult to the reader:

/* Hash indexes are never lossy (at the moment anyway) */
- scan->xs_recheck = false;
+#ifdef HASHVALUE_ONLY
+ scan->xs_recheck = true;
+#else
+ scan->xs_recheck = false;
+#endif

The fact that the patch doesn't touch backend/access/hash/README is
already grounds for rejection, but can't you be bothered to fix a
comment that is literally one line away from where you are making
it wrong?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 07-18-2008, 10:51 AM
Jonah H. Harris
 
Posts: n/a
Default Re: [PATCH]-hash index improving

On Fri, Jul 18, 2008 at 1:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> While we are griping about readability: failure to update the comments
> to match the code is NOT, NOT, NOT acceptable. I had barely started
> to read the patch before encountering this insult to the reader:
> ...


As this is Xiao's first patch to the community, I'd appreciate it if
you guys would chill a little. I'm fully aware of what needs to be
done with it clean-up wise, but given that we're under some time
constraints, I asked that he submit his preliminary patch for those
who wanted to preview/test it.

Again, this patch is nowhere near acceptance quality, nor was it
intended to be. I'll make sure he gets the next one cleaned up prior
to submitting it.

The real question now has been listed above; why are hash index
queries, including this patch, no better than b-tree even for straight
equality comparisons? Because hash is lossy, we could easily be
performing multiple I/Os for recheck, and that may be causing some of
the performance problems. I haven't checked I/O for that yet, but was
wondering if you (Tom) knew of any major design/implementation
deficiency we should be taking into consideration regarding PG's hash
index implementation?

--
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com
Edison, NJ 08837 | http://www.enterprisedb.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 07-18-2008, 10:51 AM
Tom Lane
 
Posts: n/a
Default Re: [PATCH]-hash index improving

"Jonah H. Harris" <jonah.harris@gmail.com> writes:
> The real question now has been listed above; why are hash index
> queries, including this patch, no better than b-tree even for straight
> equality comparisons?


Well, the theoretical advantage is that a hash probe is O(1) while a
btree probe is O(log N) (ignoring a lot of possible nonoptimalities
on each side of course). So you'd need a test case large enough for
log N to be noticeably more than 1, which I think means in practice that
the upper levels of the btree don't fit in memory. So small test cases
aren't going to show any improvement.

I think the historical problem with our hash implementation has been
that it was terribly space-inefficient, because of the fixed (and large)
bucket size. A dense btree index can probably beat a sparse hash up to
exceedingly large values of N. It's not clear to me how much the patch
at hand does to fix that.

The patch might introduce a new problem, which is extra heap visits
caused by the lossiness of the hash value. Again, that doesn't hurt
much ideally, but maybe you chanced to use a test case where it does
hurt. It would be worth sticking in a bit of debug instrumentation to
see whether the number of failed heap visits is significant.

In any case, the reported test seems to be dealing with index sizes in
the tens-of-megabytes range, and it doesn't surprise me a whole lot that
an O(log N) penalty isn't showing up there. Try a test case where the
index size is significantly larger than your RAM.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 07:26 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com