This is a discussion on select * in views within the SQL Server forums, part of the Microsoft SQL Server category; --> Well, first of all IBM is not a product. I could very well be an IBM SQL Server expert ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Well, first of all IBM is not a product. I could very well be an IBM SQL Server expert (working for IGS or Websphere for example), but I am a DB2 expert, which I take it was what you meant. :-) My comments were generic and I'm confident they apply to any SQL RDBMS out there. Couple of observations: 1. A thumb rule for using TBSCAN over (index scan & page fetch) is when the selectivity is worse than, say 5%. Sure any DBSM will pick it's own secret number, but I don't doubt for a moment that SQL Server, having a cost based optimizer will eventually say: "Crap. You make me go after columns not in my index and I'm going to touch next to at least one row in any page anyway!". Enabling queries to use covering indexes is important. SELECT * is counter productive on that end. 2. If all you did in you experiment was a simple CREATE VIEW V AS SELECT * FROM T then this doesn't say too much. Thsi is way I made my statement relative. If you SQL is more complex then mileage will vary depending e.g. on the version of your RDBMS since you are relying on the optimizer. SQL is about: You tell the RDBMS WHAT you want. The RDBMS takes care of the HOW to get it. As developers it is our responsibility to specify the WHAT correctly. I'm referring back to the view vs. stored procedure thread here. In that thread users didn't trust views at all. I don't go that far, but I wouldn't trust any optimizer ( SQL Server, DB2, you name it) to fix up all my lazyness through divine intuition. Lastly I respectfully disagree with the subsequent note in this thread that over specification of columns in a view has no performance impact. While this may be true in most cases once the statement invoking the view is compiled, you are relying on the compiler to optimize out the unused columns. And that is code path and costs you CPU on first compile. Whether you can actually "feel" that depends on cache friendly your app is. Cheers Serge -- Serge Rielau DB2 Solutions Development IBM Toronto Lab |
| |||
| Serge Rielau (srielau@ca.ibm.com) writes: > Lastly I respectfully disagree with the subsequent note in this thread > that over specification of columns in a view has no performance impact. I think Peter's question actually was whether there was a performance impact of saying "SELECT *" instead of listing all columns in the table explicitly, and for all my dislike for SELECT * in production code, I can't think of any reason why SELECT * should be any more expensive in SQL Server. I completely agree with that best is to include exactly those columns for which there is an actual need. One problem I often fight at work is that I want to drop a column, or drastically change the meaning of it. I suspect that it is not really in use, but still I find a bunch of stored procedures that return this column in a result set. But I can also see that these procedures aim at returning the universe, so I have no idea whether the value is used for something. (Usually, I end up dropping the column anyway.) -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx |
| |||
| Serge Rielau wrote: > 1. A thumb rule for using TBSCAN over (index scan & page fetch) is when > the selectivity is worse than, say 5%. Sure any DBSM will pick it's own > secret number, Serge, I think that threshold selectivity is not a constant - for SQL Server it also depends on the width of bookmarks. So, if bookmarks are 4 byte integers, the threshold selectivity is lower than if bookmarks are 50-byte character fields. Similarly, besides selectivity DB2 also considers clustering factor, and if an index has a high clustering factor, even low 50% selectivity might be good enough for an access via an index, correct? |
| |||
| > > CREATE TABLE dbo.t1 (x INT NOT NULL PRIMARY KEY, z1 INT NULL); > GO > CREATE VIEW dbo.v1 AS > SELECT * FROM dbo.t1 > GO > ALTER TABLE dbo.t1 DROP COLUMN z1 ; > ALTER TABLE dbo.t1 ADD z2 INT ; > GO > EXEC dbo.sp_recompile 'dbo.v1' ; > /* Notice that the second column still exists as Z1 in the view */ > SELECT x,z1 FROM dbo.v1 ; > > -- O that is bad!!! I gonne test that immidiatly tuesday. Is it different when naming the columns? |
| |||
| "Erland Sommarskog" <esquel@sommarskog.se> wrote in message news:Xns97A7C00123E9EYazorman@127.0.0.1... > Serge Rielau (srielau@ca.ibm.com) writes: >> Lastly I respectfully disagree with the subsequent note in this thread >> that over specification of columns in a view has no performance impact. > > I think Peter's question actually was whether there was a performance > impact of saying "SELECT *" instead of listing all columns in the table > explicitly, and for all my dislike for SELECT * in production code, I > can't think of any reason why SELECT * should be any more expensive in > SQL Server. > > I completely agree with that best is to include exactly those columns > for which there is an actual need. > > One problem I often fight at work is that I want to drop a column, or > drastically change the meaning of it. I suspect that it is not really > in use, but still I find a bunch of stored procedures that return this > column in a result set. But I can also see that these procedures aim > at returning the universe, so I have no idea whether the value is used > for something. (Usually, I end up dropping the column anyway.) > > > -- > Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se > > Books Online for SQL Server 2005 at > http://www.microsoft.com/technet/pro...ads/books.mspx > Books Online for SQL Server 2000 at > http://www.microsoft.com/sql/prodinf...ons/books.mspx Erland, In my case, the views only virtualizes the tables. The views must have the same structure as the table. So if there are colums added to the table, the view needs to be updated to. In that case, you have the same problem as the table without view. You want to delete a column but you dont know if it is used anywere. What I mean is, declaring the view as select * is exactly what I functional want. Give me all columns of the table, no matter what columns there are. So, the statement SELECT * FROM sometable comes closer to what I want and what I mean than SELECT column1, column2, column3, column4 FROM sometable |
| |||
| "Serge Rielau" <srielau@ca.ibm.com> wrote in message news:4af2kbFst43jU1@individual.net... > Well, first of all IBM is not a product. I could very well be an IBM SQL > Server expert (working for IGS or Websphere for example), but I am a DB2 > expert, which I take it was what you meant. :-) > > My comments were generic and I'm confident they apply to any SQL RDBMS out > there. > > Couple of observations: > 1. A thumb rule for using TBSCAN over (index scan & page fetch) is when > the selectivity is worse than, say 5%. Sure any DBSM will pick it's own > secret number, but I don't doubt for a moment that SQL Server, having a > cost based optimizer will eventually say: "Crap. You make me go after > columns not in my index and I'm going to touch next to at least one row in > any page anyway!". > Enabling queries to use covering indexes is important. SELECT * is counter > productive on that end. > I never execute select * from sometable. I only use it in a definition of a view. If I only select a few columns from that view, I see in the query plan the select * is never executed. Even I dont even see the name of the view. So, the query plan shows: select column1 from someviewwithselect* returns less bytes per row compared to select column1, column2 from sameviewwithselect* Also, if the columns I select are in a alternate index, and SQL server uses that alternate index, SQL server doesnt go to the datapages of the table. SQL Server gets is data only form the index blocks. Unless the query plan is lying to me. So to me te code create table x (column1, column2, colum3) select column1 from x has the same execution plan as create table x (column1, column2, column3) create view y as select * from x select column1 from y Serge, what do I miss? I dont know what your story about indexes and scans has to do whith selecting on views. |
| |||
| Alexander Kuznetsov wrote: > Serge Rielau wrote: >> 1. A thumb rule for using TBSCAN over (index scan & page fetch) is when >> the selectivity is worse than, say 5%. Sure any DBSM will pick it's own >> secret number, > > Serge, > > I think that threshold selectivity is not a constant - for SQL Server > it also depends on the width of bookmarks. So, if bookmarks are 4 byte > integers, the threshold selectivity is lower than if bookmarks are > 50-byte character fields. Struggling to research bookmarks I ran into: http://msdn2.microsoft.com/en-US/lib...0(SQL.90).aspx "Bookmark Lookup is not used in SQL Server 2005. Instead, Clustered Index Seek and RID Lookup provide bookmark lookup functionality." Cool, let's bypass that one. > Similarly, besides selectivity DB2 also considers clustering factor, > and if an index has a high clustering factor, even low 50% selectivity > might be good enough for an access via an index, correct? Certainly clustering will affect the decision. But the point of my note was not to get into the gory details (see "thumb rule") as to state that a covering index scan is certainly preferable in the vast majority of cases over a "index seek and RID lookup" (in SQL Server words, aka ISCAN/FETCH in DB2 words) and that "index seek and RID lookup" compete with table scans. So whenever a query for more columns that it consumes it's at risk of getting suboptimal plans. Back to Peter, if you select all rows and all columns in your view, then what good is it? Are you blindly applying some lesson in some book? views are used for access control, as shorthands for complex queries and to hide details of the tables. I don't see how you do any of that... The simplicity of your views also prevents you from seeing my point that the optimizer (any optimizer) is fallible and you don't want to find out about the unused column that wasn't dropped when it's too late. Cheers Serge -- Serge Rielau DB2 Solutions Development IBM Toronto Lab |
| |||
| Peter wrote: > Serge, what do I miss? I think you are using the wrong tool: http://msdn2.microsoft.com/en-us/lib...4(SQL.90).aspx try CREATE SYNONYM Cheers Serge -- Serge Rielau DB2 Solutions Development IBM Toronto Lab |
| |||
| Serge Rielau wrote: > Struggling to research bookmarks I ran into: > http://msdn2.microsoft.com/en-US/lib...0(SQL.90).aspx > "Bookmark Lookup is not used in SQL Server 2005. Instead, Clustered > Index Seek and RID Lookup provide bookmark lookup functionality." > Cool, let's bypass that one. Interesting. My understanding is that SQL Server 2000 does not store clustering factor as part of statistics, and the reason is simple: it makes no sence for bookmark lookups. However, for more efficient RID lookups, it would make perfect sence to both calculate clustering factor as part of statistics gathering and have the optimizer use it. Yet I tried to google up "clustering factor"+"SQL server" and that brought up nothing relevant to SQL Server. Does SQL Server 2005 optimizer use clustering factor, and if yes, what's the proper word for it? |
| ||||
| Alexander Kuznetsov (AK_TIREDOFSPAM@hotmail.COM) writes: > Serge Rielau wrote: >> Struggling to research bookmarks I ran into: >> http://msdn2.microsoft.com/en-US/lib...0(SQL.90).aspx >> "Bookmark Lookup is not used in SQL Server 2005. Instead, Clustered >> Index Seek and RID Lookup provide bookmark lookup functionality." >> Cool, let's bypass that one. I think this is mainly a terminology issue. Instead of talking of bookmark lookups, they present it as a join between the NC index and the clustered index. But there is no real difference between SQL 2000 and SQL 2005, as far as I understand. > Interesting. My understanding is that SQL Server 2000 does not store > clustering factor as part of statistics, and the reason is simple: it > makes no sence for bookmark lookups. However, for more efficient RID > lookups, it would make perfect sence to both calculate clustering > factor as part of statistics gathering and have the optimizer use it. > Yet I tried to google up "clustering factor"+"SQL server" and that > brought up nothing relevant to SQL Server. > > Does SQL Server 2005 optimizer use clustering factor, and if yes, > what's the proper word for it? I'm not really sure what you mean with clustering factor. As for RID lookups, they only occur with heaps, that is tables without clustered indexes. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx |