This is a discussion on Re: [PATCH]-hash index improving within the pgsql Hackers forums, part of the PostgreSQL category; --> Xiao Meng escribió: > The patch store hash code only in the index tuple. > It based on Neil ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Xiao Meng escribió: > The patch store hash code only in the index tuple. > It based on Neil Conway's patch with an old version of PostgreSQL. > It passes the regression test but I didn't test the performance yet. > Anyone interested can make a performance test;-) > You can undefine the macro HASHVALUE_ONLY in hash.h to get the > original implementation. I think having the HASHVALUE_ONLY define is not a good idea -- it just makes the patch harder to read. I suggest just removing the old code and putting the new code in place. (That's why we have revision control.) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| On Thu, Jul 17, 2008 at 12:42 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > I think having the HASHVALUE_ONLY define is not a good idea -- it just > makes the patch harder to read. I suggest just removing the old code > and putting the new code in place. (That's why we have revision > control.) Agreed. I'm also getting a crash on it with large data sets, so we'll make sure to get those fixed in the next version of the patch. -Jonah -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| On Thu, Jul 17, 2008 at 12:42:39PM -0400, Alvaro Herrera wrote: > Xiao Meng escribi?: > > The patch store hash code only in the index tuple. > > It based on Neil Conway's patch with an old version of PostgreSQL. > > It passes the regression test but I didn't test the performance yet. > > Anyone interested can make a performance test;-) > > You can undefine the macro HASHVALUE_ONLY in hash.h to get the > > original implementation. > > I think having the HASHVALUE_ONLY define is not a good idea -- it just > makes the patch harder to read. I suggest just removing the old code > and putting the new code in place. (That's why we have revision > control.) > One thing it helps is building an old version and a new version for comparative testing. Otherwise, you could end up with an apples-to- oranges comparison. I certainly think that the final patch should not have it, but it is useful now for testing and comparisons. My two cents, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| On Thu, Jul 17, 2008 at 02:00:07PM -0400, Jonah H. Harris wrote: > On Thu, Jul 17, 2008 at 1:54 PM, Kenneth Marshall <ktm@rice.edu> wrote: > >> I think having the HASHVALUE_ONLY define is not a good idea -- it just > >> makes the patch harder to read. I suggest just removing the old code > >> and putting the new code in place. (That's why we have revision > >> control.) > >> > > One thing it helps is building an old version and a new version > > for comparative testing. Otherwise, you could end up with an apples-to- > > oranges comparison. I certainly think that the final patch should not > > have it, but it is useful now for testing and comparisons. > > Yes, that's why Xiao did it that way. However, we traditionally just > submit a patch with only the changes and it's up to the person testing > to have an identical build-tree without the patch for testing. > Another reason for it is that even if you build without the define, > the patch author may have mistakenly added something outside the ifdef > which could impact testing. > > I agree with Alvaro that we should submit it as a standard change patch. Okay, that makes sense. Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| On Thu, Jul 17, 2008 at 1:54 PM, Kenneth Marshall <ktm@rice.edu> wrote: >> I think having the HASHVALUE_ONLY define is not a good idea -- it just >> makes the patch harder to read. I suggest just removing the old code >> and putting the new code in place. (That's why we have revision >> control.) >> > One thing it helps is building an old version and a new version > for comparative testing. Otherwise, you could end up with an apples-to- > oranges comparison. I certainly think that the final patch should not > have it, but it is useful now for testing and comparisons. Yes, that's why Xiao did it that way. However, we traditionally just submit a patch with only the changes and it's up to the person testing to have an identical build-tree without the patch for testing. Another reason for it is that even if you build without the define, the patch author may have mistakenly added something outside the ifdef which could impact testing. I agree with Alvaro that we should submit it as a standard change patch. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Kenneth Marshall escribió: > On Thu, Jul 17, 2008 at 12:42:39PM -0400, Alvaro Herrera wrote: > > I think having the HASHVALUE_ONLY define is not a good idea -- it just > > makes the patch harder to read. I suggest just removing the old code > > and putting the new code in place. (That's why we have revision > > control.) > > > One thing it helps is building an old version and a new version > for comparative testing. Otherwise, you could end up with an apples-to- > oranges comparison. I certainly think that the final patch should not > have it, but it is useful now for testing and comparisons. For this purpose I think it would be easier to have a separate tree with the patch, and one without it. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| On Thu, Jul 17, 2008 at 02:11:20PM -0400, Alvaro Herrera wrote: > Kenneth Marshall escribió: > > On Thu, Jul 17, 2008 at 12:42:39PM -0400, Alvaro Herrera wrote: > > > > I think having the HASHVALUE_ONLY define is not a good idea -- > > > it just makes the patch harder to read. I suggest just removing > > > the old code and putting the new code in place. (That's why we > > > have revision control.) > > > > > One thing it helps is building an old version and a new version > > for comparative testing. Otherwise, you could end up with an > > apples-to- oranges comparison. I certainly think that the final > > patch should not have it, but it is useful now for testing and > > comparisons. > > For this purpose I think it would be easier to have a separate tree > with the patch, and one without it. Here's one tree. Anyone can get an initial copy as follows: git clone http://git.postgresql.org/git/~davidfetter/hash/.git Xiao Meng, if you let me know where your git repo is, say by cloning onto a machine I can see from the internet and applying your patches to it, I can pull and let others see it Yes, I know it's a little cumbersome, but we'll get something slicker as we figure out what "slicker" really should mean. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Alvaro Herrera <alvherre@commandprompt.com> writes: > Xiao Meng escribió: >> You can undefine the macro HASHVALUE_ONLY in hash.h to get the >> original implementation. > I think having the HASHVALUE_ONLY define is not a good idea -- it just > makes the patch harder to read. While we are griping about readability: failure to update the comments to match the code is NOT, NOT, NOT acceptable. I had barely started to read the patch before encountering this insult to the reader: /* Hash indexes are never lossy (at the moment anyway) */ - scan->xs_recheck = false; +#ifdef HASHVALUE_ONLY + scan->xs_recheck = true; +#else + scan->xs_recheck = false; +#endif The fact that the patch doesn't touch backend/access/hash/README is already grounds for rejection, but can't you be bothered to fix a comment that is literally one line away from where you are making it wrong? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| On Fri, Jul 18, 2008 at 1:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > While we are griping about readability: failure to update the comments > to match the code is NOT, NOT, NOT acceptable. I had barely started > to read the patch before encountering this insult to the reader: > ... As this is Xiao's first patch to the community, I'd appreciate it if you guys would chill a little. I'm fully aware of what needs to be done with it clean-up wise, but given that we're under some time constraints, I asked that he submit his preliminary patch for those who wanted to preview/test it. Again, this patch is nowhere near acceptance quality, nor was it intended to be. I'll make sure he gets the next one cleaned up prior to submitting it. The real question now has been listed above; why are hash index queries, including this patch, no better than b-tree even for straight equality comparisons? Because hash is lossy, we could easily be performing multiple I/Os for recheck, and that may be causing some of the performance problems. I haven't checked I/O for that yet, but was wondering if you (Tom) knew of any major design/implementation deficiency we should be taking into consideration regarding PG's hash index implementation? -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com Edison, NJ 08837 | http://www.enterprisedb.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| ||||
| "Jonah H. Harris" <jonah.harris@gmail.com> writes: > The real question now has been listed above; why are hash index > queries, including this patch, no better than b-tree even for straight > equality comparisons? Well, the theoretical advantage is that a hash probe is O(1) while a btree probe is O(log N) (ignoring a lot of possible nonoptimalities on each side of course). So you'd need a test case large enough for log N to be noticeably more than 1, which I think means in practice that the upper levels of the btree don't fit in memory. So small test cases aren't going to show any improvement. I think the historical problem with our hash implementation has been that it was terribly space-inefficient, because of the fixed (and large) bucket size. A dense btree index can probably beat a sparse hash up to exceedingly large values of N. It's not clear to me how much the patch at hand does to fix that. The patch might introduce a new problem, which is extra heap visits caused by the lossiness of the hash value. Again, that doesn't hurt much ideally, but maybe you chanced to use a test case where it does hurt. It would be worth sticking in a bit of debug instrumentation to see whether the number of failed heap visits is significant. In any case, the reported test seems to be dealing with index sizes in the tens-of-megabytes range, and it doesn't surprise me a whole lot that an O(log N) penalty isn't showing up there. Try a test case where the index size is significantly larger than your RAM. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| Thread Tools | |
| Display Modes | |
|
|