vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I've got a database with a star schema and I'm kind of confused about what the optimizer is doing with my queries. I've got a fact table (about 24 M rows), which I'm joining to 3 dimensions. Each dimension ID in the fact has it's own index. No matter what I do, I've been unable to get the optimizer to choose a star join (IXAND or the like). It always just picks one of the dimensions and does a nested loop join and then HSJOINs in the rest. To be clear the queries I've tried are like this: select a.field, b.field, count(*) from fact f, dim1 a, dim2 b, dim 3 c where f.dim1_key = a.dim1_key and f.dim2_key = b.dim2_key and f.dim3_key = c.dim3_key and a.field like 'Data%' and b.field like 'Data%' and c.field like 'Data%' group by a.field, b.field The really odd-ball thing is that I've got another dimension, which has absolutely no selectivity -- i.e. All the items in the fact are in a single dimension. (I know this seems stupid, but it's a test database). If I join in this dimension (in queries similar to the one above), in many of my queries, the optimizer will actually choose this dimension to drive the NLJOIN with the fact. How it could ever use that dimension over anything else is beyond me as the result is essentially a full table scan. So, what I'm wondering is 2 things: 1) Can anyone point me to a good reference about how star joins work or offer any advice regarding gettting the optimizer to use a star join? 2) Does anyone have any idea why the optimizer would choose worst possible index under any circumstances? Oh, by the way, the stats are all up-to-date. Thanks, Greig Wise DB2 DBA Press Ganey Associates |
| |||
| I'm following up my own message here. I've learned more. In regards to the problem where the optimizer selects the dimension with only one matching pair in the fact table, I think I understand what's going on there. Although there's only one value for this particular dimension in the fact table, there are actually 49 other records in the dimension for which no facts exist. It seems that that the optimizer is assuming 2% selectivity since I'm selecting one of 50 records in the dimension (even though the fact only has data for the one single value of that dimension that I'm specifying in the where clause of the query). To prove this is true to myself, I removed the other 49 records from the dimension, reran stats and re-explained the query and indeed it no longer used this index to drive the NLJOIN. It seems strange that the optimizer would assume a totally even distribution of a dimension within a fact like that. Is that what it's really doing or am I missing something? It seems pretty odd, especially since I've run stats with distrubution I'd think the optimizer would be able to tell that the dimension wasn't evenly distributed in the fact table. Any information would be helpful. Thanks, Greig Wise gwise@pressganey.com wrote: > I've got a database with a star schema and I'm kind of confused about > what the optimizer is doing with my queries. I've got a fact table > (about 24 M rows), which I'm joining to 3 dimensions. Each dimension > ID in the fact has it's own index. > > No matter what I do, I've been unable to get the optimizer to choose a > star join (IXAND or the like). It always just picks one of the > dimensions and does a nested loop join and then HSJOINs in the rest. > To be clear the queries I've tried are like this: > > select a.field, b.field, count(*) > from fact f, dim1 a, dim2 b, dim 3 c > where > f.dim1_key = a.dim1_key and > f.dim2_key = b.dim2_key and > f.dim3_key = c.dim3_key and > a.field like 'Data%' and > b.field like 'Data%' and > c.field like 'Data%' > group by a.field, b.field > > The really odd-ball thing is that I've got another dimension, which has > absolutely no selectivity -- i.e. All the items in the fact are in a > single dimension. (I know this seems stupid, but it's a test > database). If I join in this dimension (in queries similar to the one > above), in many of my queries, the optimizer will actually choose this > dimension to drive the NLJOIN with the fact. How it could ever use > that dimension over anything else is beyond me as the result is > essentially a full table scan. > > So, what I'm wondering is 2 things: > > 1) Can anyone point me to a good reference about how star joins work > or offer any advice regarding gettting the optimizer to use a star > join? > 2) Does anyone have any idea why the optimizer would choose worst > possible index under any circumstances? > > Oh, by the way, the stats are all up-to-date. > > Thanks, > Greig Wise > DB2 DBA > Press Ganey Associates |
| |||
| gwise@pressganey.com wrote: > I'm following up my own message here. I've learned more. In regards > to the problem where the optimizer selects the dimension with only one > matching pair in the fact table, I think I understand what's going on > there. Although there's only one value for this particular dimension > in the fact table, there are actually 49 other records in the dimension > for which no facts exist. It seems that that the optimizer is assuming > 2% selectivity since I'm selecting one of 50 records in the dimension > (even though the fact only has data for the one single value of that > dimension that I'm specifying in the where clause of the query). > > To prove this is true to myself, I removed the other 49 records from > the dimension, reran stats and re-explained the query and indeed it no > longer used this index to drive the NLJOIN. > > It seems strange that the optimizer would assume a totally even > distribution of a dimension within a fact like that. Is that what it's > really doing or am I missing something? It seems pretty odd, > especially since I've run stats with distrubution I'd think the > optimizer would be able to tell that the dimension wasn't evenly > distributed in the fact table. > > Any information would be helpful. > > Thanks, > Greig Wise > > > gwise@pressganey.com wrote: >> I've got a database with a star schema and I'm kind of confused about >> what the optimizer is doing with my queries. I've got a fact table >> (about 24 M rows), which I'm joining to 3 dimensions. Each dimension >> ID in the fact has it's own index. >> >> No matter what I do, I've been unable to get the optimizer to choose a >> star join (IXAND or the like). It always just picks one of the >> dimensions and does a nested loop join and then HSJOINs in the rest. >> To be clear the queries I've tried are like this: >> >> select a.field, b.field, count(*) >> from fact f, dim1 a, dim2 b, dim 3 c >> where >> f.dim1_key = a.dim1_key and >> f.dim2_key = b.dim2_key and >> f.dim3_key = c.dim3_key and >> a.field like 'Data%' and >> b.field like 'Data%' and >> c.field like 'Data%' >> group by a.field, b.field >> >> The really odd-ball thing is that I've got another dimension, which has >> absolutely no selectivity -- i.e. All the items in the fact are in a >> single dimension. (I know this seems stupid, but it's a test >> database). If I join in this dimension (in queries similar to the one >> above), in many of my queries, the optimizer will actually choose this >> dimension to drive the NLJOIN with the fact. How it could ever use >> that dimension over anything else is beyond me as the result is >> essentially a full table scan. >> >> So, what I'm wondering is 2 things: >> >> 1) Can anyone point me to a good reference about how star joins work >> or offer any advice regarding gettting the optimizer to use a star >> join? >> 2) Does anyone have any idea why the optimizer would choose worst >> possible index under any circumstances? >> >> Oh, by the way, the stats are all up-to-date. ... but which stats did you run? -- Serge Rielau DB2 Solutions Development IBM Toronto Lab IOD Conference http://www.ibm.com/software/data/ond...ness/conf2006/ |
| ||||
| I did runstats on table with distribution and detailed indexes all for every table in the database. Greig Serge Rielau wrote: > >> Oh, by the way, the stats are all up-to-date. > .. but which stats did you run? > > -- > Serge Rielau > DB2 Solutions Development > IBM Toronto Lab > > IOD Conference > http://www.ibm.com/software/data/ond...ness/conf2006/ |