This is a discussion on Auto created statistics and missing statistics within the SQL Server forums, part of the Microsoft SQL Server category; --> Hello group. I have an issue, which has bothered me for a while now: I'm wondering why the column ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello group. I have an issue, which has bothered me for a while now: I'm wondering why the column statistics, which SQL Server wants me to create, if I turn off auto-created statistics, are so important to the optimizer? Example: from Northwind (with auto create stats off), I do the following: SELECT * FROM Customers WHERE Country = 'Sweden' My query plan show a clustered index scan, which is expected - no index exists for Country. BUT, the query plan also shows, that the optimizer is missing a statistic on Country, which tells me, that the optimizer would benefit from knowing this. I cannot see why? (and I've been trying for a while now). If I create the missing statistics, nothing happens in the query plan (and why should it?). I could understand it, if the optimizer suggested an index on Country - this would make sense, but if creating the missing index, query analyzer creates the statistics with an empty index, which seems to me to be less than usable. I've been thinking long and hard about this, but haven't been able to reach a conclusion optimizer to create missing statistics limits my options for designing indexes (e.g. covering) for some rather wide tables, so I'm thinking why not turn it off altogether. But I would like to know the consequences - hope somebody has already delved into this, and knows a good explanation. Rgds Jesper |
| |||
| "Jesper Jensen" <moellemand@post.tdcadsl.dk> wrote in message news:40727f2e$0$237$edfadb0f@dread12.news.tele.dk. .. > Hello group. > > I have an issue, which has bothered me for a while now: > > I'm wondering why the column statistics, which SQL Server wants me to > create, if I turn off auto-created statistics, are so important to the > optimizer? > > Example: from Northwind (with auto create stats off), I do the following: > > SELECT * FROM Customers WHERE Country = 'Sweden' > > My query plan show a clustered index scan, which is expected - no index > exists for Country. BUT, the query plan also shows, that the optimizer is > missing a statistic on Country, which tells me, that the optimizer would > benefit from knowing this. > > I cannot see why? (and I've been trying for a while now). > > If I create the missing statistics, nothing happens in the query plan (and > why should it?). I could understand it, if the optimizer suggested an index > on Country - this would make sense, but if creating the missing index, query > analyzer creates the statistics with an empty index, which seems to me to be > less than usable. > > I've been thinking long and hard about this, but haven't been able to reach > a conclusion > optimizer to create missing statistics limits my options for designing > indexes (e.g. covering) for some rather wide tables, so I'm thinking why not > turn it off altogether. But I would like to know the consequences - hope > somebody has already delved into this, and knows a good explanation. > > Rgds > Jesper > > http://msdn.microsoft.com/library/de.../statquery.asp Simon |
| |||
| Thanks, Simon, informative article, but ... .... it doesn't really explain the stuff, that I wrote. The closest I get to an explanation, when reading this is 'These statistics are created for columns where the optimizer would have to estimate the approximate density or distribution otherwise'. I knew this, but I still do not know, why the optimizer needs to know the density and/or distribution?? I can see no valid reason, and therefore I can see no good reason for enabling auto-creation of stats. What I probably looking for is a good example, where the use of an automatically created stat saves time, cycles and IOs Best Rgds - Jesper "Simon Hayes" <sql@hayes.ch> skrev i en meddelelse news:4072f05a$1_2@news.bluewin.ch... > > "Jesper Jensen" <moellemand@post.tdcadsl.dk> wrote in message > news:40727f2e$0$237$edfadb0f@dread12.news.tele.dk. .. > > Hello group. > > > > I have an issue, which has bothered me for a while now: > > > > I'm wondering why the column statistics, which SQL Server wants me to > > create, if I turn off auto-created statistics, are so important to the > > optimizer? > > > > Example: from Northwind (with auto create stats off), I do the following: > > > > SELECT * FROM Customers WHERE Country = 'Sweden' > > > > My query plan show a clustered index scan, which is expected - no index > > exists for Country. BUT, the query plan also shows, that the optimizer is > > missing a statistic on Country, which tells me, that the optimizer would > > benefit from knowing this. > > > > I cannot see why? (and I've been trying for a while now). > > > > If I create the missing statistics, nothing happens in the query plan (and > > why should it?). I could understand it, if the optimizer suggested an > index > > on Country - this would make sense, but if creating the missing index, > query > > analyzer creates the statistics with an empty index, which seems to me to > be > > less than usable. > > > > I've been thinking long and hard about this, but haven't been able to > reach > > a conclusion > > optimizer to create missing statistics limits my options for designing > > indexes (e.g. covering) for some rather wide tables, so I'm thinking why > not > > turn it off altogether. But I would like to know the consequences - hope > > somebody has already delved into this, and knows a good explanation. > > > > Rgds > > Jesper > > > > > > http://msdn.microsoft.com/library/de.../statquery.asp > > Simon > > > |
| |||
| "Jesper Jensen" <moellemand@post.tdcadsl.dk> wrote in message news:4072f91b$0$300$edfadb0f@dread12.news.tele.dk. .. > Thanks, Simon, informative article, but ... > > ... it doesn't really explain the stuff, that I wrote. The closest I get to > an explanation, when reading this is 'These statistics are created for > columns where the optimizer would have to estimate the approximate density > or distribution otherwise'. > > I knew this, but I still do not know, why the optimizer needs to know the > density and/or distribution?? I can see no valid reason, and therefore I can > see no good reason for enabling auto-creation of stats. > > What I probably looking for is a good example, where the use of an > automatically created stat saves time, cycles and IOs > > Best Rgds - Jesper > OK, here's another informative article :-) http://www.winnetmag.com/SQLServer/A...075/22075.html In summary, index statistics exist only for the first column in an index, but auto-created (or manually created) statistics can exist for any column. This gives the optimizer extra information, which might mean it chooses a different, more efficient index for a query. Check out the example on the second page of the article - on my system, this reduced the logical reads required for the query from 104 to 43. But you're correct to consider that there can be an impact on performance in some situations: http://support.microsoft.com/default...b;en-us;195565 Simon |
| |||
| Thanks, Simon, that one did the trick. One less mystery. On my machine, QA tells me that the two queries (the index scan on ProductID/Quantity vs. the clustered index scan) takes 43.65 and 56.35% cost respectively. I would argue, that this saving is not worth the 'used up index space'. In my professional life, I've seen tables, which are wide enough (200+ columns) to demand, that precious index space is saved. Basically, I think there are too many 'ifs' before an auto-created index saves performance, but I appreciate the optimization idea behind it. Thanks - Jesper "Simon Hayes" <sql@hayes.ch> skrev i en meddelelse news:4073121e_1@news.bluewin.ch... > > "Jesper Jensen" <moellemand@post.tdcadsl.dk> wrote in message > news:4072f91b$0$300$edfadb0f@dread12.news.tele.dk. .. > > Thanks, Simon, informative article, but ... > > > > ... it doesn't really explain the stuff, that I wrote. The closest I get > to > > an explanation, when reading this is 'These statistics are created for > > columns where the optimizer would have to estimate the approximate density > > or distribution otherwise'. > > > > I knew this, but I still do not know, why the optimizer needs to know the > > density and/or distribution?? I can see no valid reason, and therefore I > can > > see no good reason for enabling auto-creation of stats. > > > > What I probably looking for is a good example, where the use of an > > automatically created stat saves time, cycles and IOs > > > > Best Rgds - Jesper > > > > OK, here's another informative article :-) > > http://www.winnetmag.com/SQLServer/A...075/22075.html > > In summary, index statistics exist only for the first column in an index, > but auto-created (or manually created) statistics can exist for any column. > This gives the optimizer extra information, which might mean it chooses a > different, more efficient index for a query. > > Check out the example on the second page of the article - on my system, this > reduced the logical reads required for the query from 104 to 43. > > But you're correct to consider that there can be an impact on performance in > some situations: > > http://support.microsoft.com/default...b;en-us;195565 > > Simon > > |
| ||||
| "Jesper Jensen" <moellemand@post.tdcadsl.dk> wrote in message news:<40732877$0$274$edfadb0f@dread12.news.tele.dk >... > Thanks, Simon, that one did the trick. > > One less mystery. > > On my machine, QA tells me that the two queries (the index scan on > ProductID/Quantity vs. the clustered index scan) takes 43.65 and 56.35% cost > respectively. I would argue, that this saving is not worth the 'used up > index space'. In my professional life, I've seen tables, which are wide > enough (200+ columns) to demand, that precious index space is saved. > > Basically, I think there are too many 'ifs' before an auto-created index > saves performance, but I appreciate the optimization idea behind it. > > Thanks - Jesper > <snip> Well, you have to be careful about reaching conclusions based on simple queries using small data sets. It's possible that a complex join involving millions of rows would give a more significant difference. To get a definite answer for your environment, you would have to do some benchmarking, with and without statistics. Simon |