This is a discussion on Hash joins vs small-integer join values within the pgsql Hackers forums, part of the PostgreSQL category; --> I was idly thinking about Joseph Shraibman's problem here: http://archives.postgresql.org/pgsql...5/msg01011.php in which a large hash join seemed to be ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I was idly thinking about Joseph Shraibman's problem here: http://archives.postgresql.org/pgsql...5/msg01011.php in which a large hash join seemed to be blowing out memory. By chance I tried the following test case: js=# create table ml (jid int); CREATE TABLE js=# insert into ml select random()*1000 from generate_series(1,185391404); INSERT 0 185391404 js=# create table tempjr (id int); CREATE TABLE js=# insert into tempjr select random()*1000 from generate_series(1,60000); INSERT 0 60000 js=# analyze ml; ANALYZE js=# select count(*) from tempjr join ml on (jid=id) group by jid; Since I hadn't remembered to increase work_mem beyond the default, this set up a hash join with 4111 buckets in each of 8192 batches, which didn't seem too awfully unreasonable, so I let it go. Imagine my horror as I watched it stuff all 185 million ml rows into batch 4365. Naturally, when it got to trying to process that batch, the in-memory hashtable blew out real good. I'm not certain this is what happened to Joseph, since I don't know the stats of his jid column, but in any case it's got to be fixed. Hash join is a probabilistic algorithm, so there will always be some input distributions for which it sucks, but I don't think we can tolerate "uniformly distributed on the integers 0-N" as being one of them. The problem comes from the rather simplistic assignment of bucket and batch numbers in ExecHashGetBucketAndBatch(): * Note: on-the-fly increases of nbatch must not change the bucket number * for a given hash code (since we don't move tuples to different hash * chains), and must only cause the batch number to remain the same or * increase. Our algorithm is * bucketno = hashvalue MOD nbuckets * batchno = (hashvalue DIV nbuckets) MOD nbatch * where nbuckets should preferably be prime so that all bits of the * hash value can affect both bucketno and batchno. * nbuckets doesn't change over the course of the join. This would be fine if the hashvalues were reasonably randomly distributed over all uint32 values, but take a look at hashint4 --- it's just a one's-complement: Datum hashint4(PG_FUNCTION_ARGS) { PG_RETURN_UINT32(~PG_GETARG_UINT32(0)); } Two inputs that differ by 1 will have hash values also differing by 1. Therefore, in my test case with 4111 buckets, consecutive ranges of 4111 input values map to the same batch --- different buckets in the batch, but the same batch. My example with inputs 0..999 would have mapped to either 1 or 2 batches depending on luck. With a more realistic work_mem, nbuckets would have been larger, making this problem worse not better. 8.1 and up are broken this way; in 8.0 and before we were calculating the batch number in a different way that doesn't seem vulnerable to this particular failure mode. Arguably, the problem here is a chintzy hash function, and we should fix it by making the integer hash functions use hash_any(). I'm inclined to do that for 8.3. The problem is that this is not a back-patchable answer, because changing the hash functions would corrupt existing hash indexes. The best idea I can come up with for the back branches is to make ExecHashGetBucketAndBatch do hash_any internally, say if (nbatch > 1) { *bucketno = hashvalue % nbuckets; /* since nbatch is a power of 2, can do MOD by masking */ - *batchno = (hashvalue / nbuckets) & (nbatch - 1); + *batchno = hash_any(&hashvalue, sizeof(int32)) & (nbatch - 1); } else { *bucketno = hashvalue % nbuckets; *batchno = 0; } Comments, better ideas? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Tom, > The problem is that this is not a back-patchable > answer, because changing the hash functions would corrupt existing hash > indexes. Does anyone *use* hash indexes? > Comments, better ideas? I was just talking to Luke today and he said they had a considerable amount of cleanup on hash join they were planning to contribute for 8.4. Luke? --Josh ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| "Tom Lane" <tgl@sss.pgh.pa.us> writes: > The best idea I can come up with for the back branches is > to make ExecHashGetBucketAndBatch do hash_any internally, say hashany of a 4-byte value degenerates to pretty much just a call to mix(). Perhaps we should just expose a hash12() that takes three integers and calls mix() on them like hash_any does. The reason I'm thinking that is that we'll want to do the same thing for bigint, float4, float8, etc. And that fix you committed a while back to improve the catcache hash function made a huge difference. Now I'm wondering if it shouldn't just be invoking hash_any() or mix() too. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Josh Berkus <josh@agliodbs.com> writes: > Tom, >> The problem is that this is not a back-patchable >> answer, because changing the hash functions would corrupt existing hash >> indexes. > Does anyone *use* hash indexes? We get bug reports on 'em, so yes ... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| ||||
| Gregory Stark <stark@enterprisedb.com> writes: > hashany of a 4-byte value degenerates to pretty much just a call to mix(). > Perhaps we should just expose a hash12() that takes three integers and calls > mix() on them like hash_any does. I don't see any use for that, but probably there is a use in hash_uint32(x) that has the same result as hash_any(&x, sizeof(uint32)). > And that fix you committed a while back to improve the catcache hash function > made a huge difference. Now I'm wondering if it shouldn't just be invoking > hash_any() or mix() too. No, if the hash values are reasonably random it should be fine as-is. There should not be any need for multiple layers of hash_any calls. The big problem with the old version of that code (IMHO) was that it threw away hash bits, which it doesn't anymore. What I'm wondering is whether we can't actually simplify the callers, if we put all the burden on the hash functions to produce all-bits-random results. In particular hashjoin probably ought to switch to power-of-2 nbuckets to avoid integer divisions ... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |