Thread: Star Joins
View Single Post

   
  #3 (permalink)  
Old 02-27-2008, 08:15 AM
Serge Rielau
 
Posts: n/a
Default Re: Star Joins

gwise@pressganey.com wrote:
> I'm following up my own message here. I've learned more. In regards
> to the problem where the optimizer selects the dimension with only one
> matching pair in the fact table, I think I understand what's going on
> there. Although there's only one value for this particular dimension
> in the fact table, there are actually 49 other records in the dimension
> for which no facts exist. It seems that that the optimizer is assuming
> 2% selectivity since I'm selecting one of 50 records in the dimension
> (even though the fact only has data for the one single value of that
> dimension that I'm specifying in the where clause of the query).
>
> To prove this is true to myself, I removed the other 49 records from
> the dimension, reran stats and re-explained the query and indeed it no
> longer used this index to drive the NLJOIN.
>
> It seems strange that the optimizer would assume a totally even
> distribution of a dimension within a fact like that. Is that what it's
> really doing or am I missing something? It seems pretty odd,
> especially since I've run stats with distrubution I'd think the
> optimizer would be able to tell that the dimension wasn't evenly
> distributed in the fact table.
>
> Any information would be helpful.
>
> Thanks,
> Greig Wise
>
>
> gwise@pressganey.com wrote:
>> I've got a database with a star schema and I'm kind of confused about
>> what the optimizer is doing with my queries. I've got a fact table
>> (about 24 M rows), which I'm joining to 3 dimensions. Each dimension
>> ID in the fact has it's own index.
>>
>> No matter what I do, I've been unable to get the optimizer to choose a
>> star join (IXAND or the like). It always just picks one of the
>> dimensions and does a nested loop join and then HSJOINs in the rest.
>> To be clear the queries I've tried are like this:
>>
>> select a.field, b.field, count(*)
>> from fact f, dim1 a, dim2 b, dim 3 c
>> where
>> f.dim1_key = a.dim1_key and
>> f.dim2_key = b.dim2_key and
>> f.dim3_key = c.dim3_key and
>> a.field like 'Data%' and
>> b.field like 'Data%' and
>> c.field like 'Data%'
>> group by a.field, b.field
>>
>> The really odd-ball thing is that I've got another dimension, which has
>> absolutely no selectivity -- i.e. All the items in the fact are in a
>> single dimension. (I know this seems stupid, but it's a test
>> database). If I join in this dimension (in queries similar to the one
>> above), in many of my queries, the optimizer will actually choose this
>> dimension to drive the NLJOIN with the fact. How it could ever use
>> that dimension over anything else is beyond me as the result is
>> essentially a full table scan.
>>
>> So, what I'm wondering is 2 things:
>>
>> 1) Can anyone point me to a good reference about how star joins work
>> or offer any advice regarding gettting the optimizer to use a star
>> join?
>> 2) Does anyone have any idea why the optimizer would choose worst
>> possible index under any circumstances?
>>
>> Oh, by the way, the stats are all up-to-date.

... but which stats did you run?

--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

IOD Conference
http://www.ibm.com/software/data/ond...ness/conf2006/
Reply With Quote