vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello, I have a few rather urgent questions that I hope someone can help with (I need to figure this out prior to a meeting tomorrow.) First, a bit of background: The company I work for is developing a web-based application, one part of which involves allowing the user the ability to page through transaction "history" information. The _summary_ history table will have the following fields: ServiceName, Date, User-Ref1, User-Ref2, User-Ref3, TransactionID. The "User-Ref" fields are supplied by the user to help them identify records of interest. The user will be allowed to choose from several different, pre-defined sort orders. For example, the default sort order might be: Date, ServiceName, User-Ref1, User-Ref2, User-Ref3, TransactionID. The user can click on the headings "Date", "ServiceName", and "User-Ref1", to make those columns, respectively, the primary sort column. So, if they click on "ServiceName", then the pre-defined sort order would be: ServiceName, Date, User-Ref1, User-Ref2, User-Ref3, TransactionID. If they click on "User-Ref1", then the pre-defined sort order would be: User-Ref1, Date, ServiceName, User-Ref2, User-Ref3, TransactionID. (That may not be exactly right, but the main point is that the sort orders the user can choose from are pre-defined, based on the primary sort column - they can't change the order of the other columns.) The other item of importance is that the user needs to be able to page through the data. We're planning on using an identity column on the table in support of this - i.e., to pick up from where the last UID left off. Finally, the user will be allowed to subset the data by date range. Okay, here goes my questions: Regarding VIEWs ------------------ 1. One of my questions relates to the use of VIEWs. I was thinking that, in order to make the stored procedure logic more concise, that we could create a separate VIEW for each sort order. However, someone said that VIEWs involve the use of dynamic SQL that must be prepared each time by DB/2. Is that correct? I would think that, since the query that represents the VIEW doesn't change, it would be static. 2. The same person also said that VIEWs are used to subset data (i.e., with a WHERE clause), not for simple ordering. Is that correct? Again, my thinking was to create a VIEW for each sort order - i.e., a SELECT with a different ORDER BY for each VIEW. Basically, I figured that if we used VIEWs for the sorting, then the actual stored procedure code would be identical for each query (it would just specify a different VIEW name for each.) Regarding Indexes ------------------- 3. It was also stated that any indexes on the table would only be used by the WHERE clause, not by the ORDER BY clause. Is that right? I was thinking that we would need an index for the subsetting (date range) WHERE clause, plus indexes for each pre-defined sort order (or at least the most popular sort orders.) But it was implied that you can't do this. Regarding Limiting the Result Set Size -------------------------------------- 4. The front-end web application has memory constraints, and so can only handle a certain maximum number of rows in a result set. I was thinking that we could use the "FETCH x ROWS ONLY" clause to limit the size of the result set, and also limit the processing that DB/2 has to perform. However, I was told that DB/2 would still need to process the entire result set, and that "further constraints" would be necessary. Is that correct? In particular, it was stated that if a sort (ORDERY BY) is involved, then DB/2 builds the complete result set after processing all qualifying (WHERE) records. So, basically, if I have a SELECT with both WHERE and ORDER BY, will FETCH x ROWS ONLY limit the amount of work that DB/2 has to perform or not? Thank you VERY MUCH for any help with these questions! If possible and convenient, it would be great if you could also e-mail any responses to my address below (please remove NOSPAM from the address.) Regards, Mike mmills1969@NOSPAMyahoo.com |
| |||
| Mike wrote: > Hello, > > I have a few rather urgent questions that I hope someone can help with (I > need to figure this out prior to a meeting tomorrow.) First, a bit of > background: The company I work for is developing a web-based application, > one part of which involves allowing the user the ability to page through > transaction "history" information. You don't say which platform DB2 is running on, so I will assume Linux, UNIX or Windows. > > The other item of importance is that the user needs to be able to page > through the data. We're planning on using an identity column on the table > in support of this - i.e., to pick up from where the last UID left off. You should probably search in this newsgroup (and in the DB2 UDB SQL reference) for OLAP functions like rownumber(). This has been talked about extensively in this group with regard to fetching "pages" of data at a time. > > Regarding VIEWs > ------------------ > > 1. One of my questions relates to the use of VIEWs. I was thinking that, in > order to make the stored procedure logic more concise, that we could create > a separate VIEW for each sort order. However, someone said that VIEWs > involve the use of dynamic SQL that must be prepared each time by DB/2. Is > that correct? I would think that, since the query that represents the VIEW > doesn't change, it would be static. Views don't have anything to do with static/dynamic SQL. Static/Dynamic defines when the access path is chosen -- either at build time (static) or run-time (dynamic). > 2. The same person also said that VIEWs are used to subset data (i.e., with > a WHERE clause), not for simple ordering. Is that correct? Again, my > thinking was to create a VIEW for each sort order - i.e., a SELECT with a > different ORDER BY for each VIEW. Views represent an unordered set of data (i.e., just like a table). You can't specify an order-by in the view definition. > Regarding Indexes > ------------------- > > 3. It was also stated that any indexes on the table would only be used by > the WHERE clause, not by the ORDER BY clause. Is that right? I was > thinking that we would need an index for the subsetting (date range) WHERE > clause, plus indexes for each pre-defined sort order (or at least the most > popular sort orders.) But it was implied that you can't do this. Indexes may or may not be used for predicates (where clause) or sorting. Indexes on columns referenced in the predicate are (in my book) much more important (and more likely to be used). The SQL compiler will make the decision on which index(es) to use based on filter factors on the predicates, cardinality of the table(s), columns in the query, etc. > Regarding Limiting the Result Set Size > -------------------------------------- > > 4. The front-end web application has memory constraints, and so can only > handle a certain maximum number of rows in a result set. I was thinking > that we could use the "FETCH x ROWS ONLY" clause to limit the size of the > result set, and also limit the processing that DB/2 has to perform. > However, I was told that DB/2 would still need to process the entire result > set, and that "further constraints" would be necessary. Is that correct? > In particular, it was stated that if a sort (ORDERY BY) is involved, then > DB/2 builds the complete result set after processing all qualifying (WHERE) > records. So, basically, if I have a SELECT with both WHERE and ORDER BY, > will FETCH x ROWS ONLY limit the amount of work that DB/2 has to perform or > not? Unless your application is doing something strange (like trying to store the entire result set in memory on the client), the memory usage will be primarily on the database server, not on the client, regardless of the size of your result set. There are communication buffers between the server and client that control how much data is transferred / buffered, but this shouldn't be a big deal (the default is 32kb). Good luck, -----= Posted via Newsfeeds.Com, Uncensored Usenet News =----- http://www.newsfeeds.com - The #1 Newsgroup Service in the World! -----== Over 100,000 Newsgroups - 19 Different Servers! =----- |
| |||
| "Mike" <mwmills1969@NOSPAMyahoo.com> wrote in message news:yqmdneu7OaecRW_d4p2dnA@giganews.com... > Hello, > > I have a few rather urgent questions that I hope someone can help with (I > need to figure this out prior to a meeting tomorrow.) It's probably not a good idea to leave urgent questions to the last minute and then insist that they are urgent.... > First, a bit of > background: The company I work for is developing a web-based application, > one part of which involves allowing the user the ability to page through > transaction "history" information. > > The _summary_ history table will have the following fields: ServiceName, > Date, User-Ref1, User-Ref2, User-Ref3, TransactionID. The "User-Ref" fields > are supplied by the user to help them identify records of interest. > > The user will be allowed to choose from several different, pre-defined sort > orders. For example, the default sort order might be: Date, ServiceName, > User-Ref1, User-Ref2, User-Ref3, TransactionID. The user can click on the > headings "Date", "ServiceName", and "User-Ref1", to make those columns, > respectively, the primary sort column. So, if they click on "ServiceName", > then the pre-defined sort order would be: ServiceName, Date, User-Ref1, > User-Ref2, User-Ref3, TransactionID. If they click on "User-Ref1", then the > pre-defined sort order would be: User-Ref1, Date, ServiceName, User-Ref2, > User-Ref3, TransactionID. (That may not be exactly right, but the main > point is that the sort orders the user can choose from are pre-defined, > based on the primary sort column - they can't change the order of the other > columns.) > > The other item of importance is that the user needs to be able to page > through the data. We're planning on using an identity column on the table > in support of this - i.e., to pick up from where the last UID left off. > > Finally, the user will be allowed to subset the data by date range. > You haven't told us what platform you are on and what version of DB2 you are using. This makes it rather difficult to answer your question since each platform and version can be different in some particulars. For example, ORDER BY is not permitted in a view definition in DB2 V7.2 for Windows, Linux, and Unix. ORDER BY may or may not be allowed in view definitions for other platforms and versions. > Okay, here goes my questions: > > Regarding VIEWs > ------------------ > > 1. One of my questions relates to the use of VIEWs. I was thinking that, in > order to make the stored procedure logic more concise, that we could create > a separate VIEW for each sort order. However, someone said that VIEWs > involve the use of dynamic SQL that must be prepared each time by DB/2. Is > that correct? I would think that, since the query that represents the VIEW > doesn't change, it would be static. > NOTE: the product we are discussing on this newsgroup is DB2, not DB/2. DB2 is IBM's flagship relational database. If you are actually asking about DBASE II, which some people abbreviate 'DB/2', you should disregard any answers you get on this newsgroup because they are almost certainly not applicable to you. Your first question is probably moot since I believe that all or at least most of the recent versions of DB2 don't permit ORDER BY in a view definition. As noted above though, some platforms and versions may permit this so you should check CREATE VIEW in the SQL Reference for your particular platform and version to be sure. > 2. The same person also said that VIEWs are used to subset data (i.e., with > a WHERE clause), not for simple ordering. Is that correct? Again, my > thinking was to create a VIEW for each sort order - i.e., a SELECT with a > different ORDER BY for each VIEW. > Views are used for two primary reasons: to provide a subset of rows and columns of a table or a join of tables and to give you more granular control over security. > Basically, I figured that if we used VIEWs for the sorting, then the actual > stored procedure code would be identical for each query (it would just > specify a different VIEW name for each.) > As noted above, views will not help you by providing alternate sort orders for your data unless you happen to be using a DB2 version and platform that supports ORDER BY in a view definition. I'm not sure if there is such a version and platform; I have used DB2 on most platforms at one point or another but in some cases, like DB2 on AS/400 or z/OS, I haven't used the current versions. > > Regarding Indexes > ------------------- > > 3. It was also stated that any indexes on the table would only be used by > the WHERE clause, not by the ORDER BY clause. Is that right? I was > thinking that we would need an index for the subsetting (date range) WHERE > clause, plus indexes for each pre-defined sort order (or at least the most > popular sort orders.) But it was implied that you can't do this. > It is not true that the WHERE clause is the only clause that can use an index. ORDER BY can also use an index. (I think GROUP BY can use indexes too in some cases, although I'm not sure.) You can create indexes for each sort order if you like but bear in mind that DB2 never has to use any given index (or any index at all) for an access path; it is always DB2's choice. It's the old "you can lead a horse to water but you can't make him drink" thing: you can create any number of indexes but DB2 can decline to use these indexes. So don't be angry if you create an index but DB2 doesn't use it. There are two main reasons for DB2 declining to use an index: the query will not benefit from the index (in DB2's estimation) or your SQL is poorly written. > > Regarding Limiting the Result Set Size > -------------------------------------- > > 4. The front-end web application has memory constraints, and so can only > handle a certain maximum number of rows in a result set. I was thinking > that we could use the "FETCH x ROWS ONLY" clause to limit the size of the > result set, and also limit the processing that DB/2 has to perform. > However, I was told that DB/2 would still need to process the entire result > set, and that "further constraints" would be necessary. Is that correct? > In particular, it was stated that if a sort (ORDERY BY) is involved, then > DB/2 builds the complete result set after processing all qualifying (WHERE) > records. So, basically, if I have a SELECT with both WHERE and ORDER BY, > will FETCH x ROWS ONLY limit the amount of work that DB/2 has to perform or > not? > I haven't done this sort of programming for a while so my memory on this point is fuzzy and possibly out-of-date so take what I say with a grain of salt; maybe someone who is current can confirm or correct what I'm saying on this point. Essentially, your friend told you the truth: FETCH x ROWS ONLY does not limit the size of the result set; it is primarily a hint to the optimizer that you want the first 'x' rows of the result as quickly as possible. If possible, DB2 will try to do some "tricks" to get you those rows a little faster than it otherwise would but there is no guarantee it will be successful. Also, even if it is successful, if you then go on to read more rows than the 'x' which you specified, the additional rows may come back slower than they otherwise would have. FETCH x ROWS ONLY is intended primarily for an application that will do paging of large result sets and would mostly be used where the result might be very large while most users would normally only want 'x' of the rows. If you have an application which has to page through a lot of rows and you actually expect users to look at most of those rows most of the time, FETCH x ROWS ONLY probably won't help you. If you need to page both forwards and backwards within a result set that is (potentially) many pages in size, the technique for doing so tends to depend on the language you are using. For instance, if you use a relatively recent version of Java and JDBC, you can use scrollable cursors to move backwards and forwards fairly easily. I just did a very superficial look through the DB2 V7.2 Application Development manual trying to find a detailed explanation of paging techniques for each language and didn't find one but I know the IBM DB2 courses used to explain this technique so there may be another manual somewhere that explains this. If you can't find anything like that, post here again and perhaps someone can give you an example that illustrates the technique for your language and platform. > > Thank you VERY MUCH for any help with these questions! If possible and > convenient, it would be great if you could also e-mail any responses to my > address below (please remove NOSPAM from the address.) > Posted and emailed. Rhino |
| |||
| Just a follow-up to answer some questions others have raised: 1. The DB2 (sorry, not "DB/2") platform in question is v7 (soon v8), running on z/OS. 2. The front-end will be written in Java. The stored procedures will probably be written in COBOL. 3. Regarding the use of cursors for paging: AFAIK, this would require keeping one connection allocated to each user. We have to plan for high concurrency, and so this would not be acceptable. (Yes, we will of course pool the database connections.) 4. Regarding result set size: Two objectives here: 1) Minimize the number of rows that gets sent to the front-end in any single query (so that we don't tax memory constraints on the front-end), and 2) Minimize the number of rows that DB2 has to process (if the front-end can only handle, say, 100 rows, then we would like for DB2 to stop processing the result set after 100 rows - if possible). 5. Regarding Rhino's comments about FETCH FIRST x ROWS ONLY being a 'hint' - actually, that sounds like like you are talking about the OPTIMIZE FOR x ROWS clause. According to the SQL Reference manual, FETCH FIRST x ROWS ONLY can save work for DB2 - but I'm not sure under what circumstances, and if those apply in my case. 6. Regarding "waiting to the last minute" - I haven't done that. I can't help that this was discussed at a meeting just today! My intention is to be well-prepared for tomorrow's meeting. Let me know if there are any other questions. Your comments are appreciated! Regards, Mike |
| |||
| It depends if an index is used or not with an ORDER BY. When there are fields in your select that are not part of the index, DB2 might decide a table scan is better so it can do sequential prefetch. Specially if the index sequence is not the same as the physical row sequence, retrieving rows via the index might be much more expensive than retrieving them sequentially. It also depends on the size of your table of course. Mike wrote: > Regarding Indexes > ------------------- > > 3. It was also stated that any indexes on the table would only be used by > the WHERE clause, not by the ORDER BY clause. Is that right? I was > thinking that we would need an index for the subsetting (date range) WHERE > clause, plus indexes for each pre-defined sort order (or at least the most > popular sort orders.) But it was implied that you can't do this. > > Regards, > > Mike > mmills1969@NOSPAMyahoo.com -- Anton Versteeg IBM Certified DB2 Specialist IBM Netherlands |
| |||
| "Mike" <mwmills1969@NOSPAMyahoo.com> wrote in message news:<c7WdnTEJT8Xkh27dRVn-vA@giganews.com>... > 3. Regarding the use of cursors for paging: AFAIK, this would require > keeping one connection allocated to each user. We have to plan for high > concurrency, and so this would not be acceptable. (Yes, we will of course > pool the database connections.) > > 4. Regarding result set size: Two objectives here: 1) Minimize the number > of rows that gets sent to the front-end in any single query (so that we > don't tax memory constraints on the front-end), and 2) Minimize the number > of rows that DB2 has to process (if the front-end can only handle, say, 100 > rows, then we would like for DB2 to stop processing the result set after 100 > rows - if possible). Your solutions are pretty restricted if you want separate connections for each paging query: either you only have one sorting order, or you cut separate tables each period (day, hour, whatever) that have separate orders & separate ids in each pre-calculated. However, if you do keep the single connection for this activity, you do have other options than a cursor: you can also toss the result set into a temporary table and use an identity or row_number to populate a field used for paging. Whether or not that works better than using row_number() on the original table would depending on factors such as whether or not you can use an index or have to do a page scan of the original table, how many next page queries are likely to occur, etc. |
| |||
| Thanks for the info. Just to add a bit more info, here is what I'm thinking the query would look like - note that my original idea was to use a VIEW for each different ORDERY BY clause, but since that doesn't appear to be possible, this is how I believe it will look: Page *Forward* Through Transaction History Summary - Sort Seq #1 ----------------------------------------------------------------------- SELECT * FROM TRANHIST WHERE UID > :startUID AND DATE >= :startDate AND DATE <= :endDate ORDER BY UID ASC, DATE DESC, SERVICENAME ASC, USERREF1 ASC, USERREF2 ASC, USERREF3 ASC, TRANID ASC FETCH FIRST 200 ROWS ONLY; Page *Backward* Through Transaction History Summary - Sort Seq #1 ------------------------------------------------------------------------- SELECT * FROM TRANHIST WHERE UID < :startUID AND DATE >= :startDate AND DATE <= :endDate ORDER BY UID DESC, DATE DESC, SERVICENAME ASC, USERREF1 ASC, USERREF2 ASC, USERREF3 ASC, TRANID ASC FETCH FIRST 200 ROWS ONLY; There would then be similar queries for paging forward & backward through the history, but with different ORDER BY clauses. There would be about 3 or 4 basic sort orders (such as SERVICENAME first, or USERREF1 first), and for each one of these the user will have the ability to select ASCENDING or DESCENDING sort order on the *primary* sort column (not the subordinate columns; those will be pre-defined.) Note that the WHERE clause only includes the DATE and UID fields. Anyway, as I mentioned before, one key question is whether the FETCH FIRST x ROWS ONLY will save DB2 from doing extraneous work or not - i.e., will it "short-circuit" the query at 200 rows, or will it have to process the entire result set first? And if it won't save any extra work, is there anything else I can do to help it? Again, due to volume & concurrency, we want to save memory utilization on the front-end, as well as the amount of work DB2 has to do on the back-end. If the front-end can only cache 200 rows at a time, I'd like to prevent DB2 from having to process more than that, if possible. Thanks, Mike "Anton Versteeg" <anton_versteeg@nnll.ibm.com> wrote in message news:cd06ej$1270$1@sp15en20.hursley.ibm.com... > It depends if an index is used or not with an ORDER BY. > When there are fields in your select that are not part of the index, > DB2 might decide a table scan is better so it can do sequential > prefetch. Specially if the index sequence is not the same as the > physical row sequence, retrieving rows via the index might be much more > expensive than retrieving them sequentially. It also depends on the size > of your table of course. > > Mike wrote: > > Regarding Indexes > > ------------------- > > > > 3. It was also stated that any indexes on the table would only be used by > > the WHERE clause, not by the ORDER BY clause. Is that right? I was > > thinking that we would need an index for the subsetting (date range) WHERE > > clause, plus indexes for each pre-defined sort order (or at least the most > > popular sort orders.) But it was implied that you can't do this. > > > > > Regards, > > > > Mike > > mmills1969@NOSPAMyahoo.com > > -- > Anton Versteeg > IBM Certified DB2 Specialist > IBM Netherlands > |
| |||
| "Buck Nuggets" <bucknuggets@yahoo.com> wrote in message news:66a61715.0407130749.356ddb81@posting.google.c om... > "Mike" <mwmills1969@NOSPAMyahoo.com> wrote in message news:<c7WdnTEJT8Xkh27dRVn-vA@giganews.com>... > > > 3. Regarding the use of cursors for paging: AFAIK, this would require > > keeping one connection allocated to each user. We have to plan for high > > concurrency, and so this would not be acceptable. (Yes, we will of course > > pool the database connections.) > > > > 4. Regarding result set size: Two objectives here: 1) Minimize the number > > of rows that gets sent to the front-end in any single query (so that we > > don't tax memory constraints on the front-end), and 2) Minimize the number > > of rows that DB2 has to process (if the front-end can only handle, say, 100 > > rows, then we would like for DB2 to stop processing the result set after 100 > > rows - if possible). > > Your solutions are pretty restricted if you want separate connections > for each paging query: either you only have one sorting order, or you > cut separate tables each period (day, hour, whatever) that have > separate orders & separate ids in each pre-calculated. Thanks for the comments. But, do you agree that if high concurrency is a concern, that we can't really allocate a connection to a single user, while the user "thinks" about paging forward/backward? About the temporary table - wouldn't that be on a per-connection basis, though? Or could I access the same temporary table from *any* connection? (Of course I realize that I could optimize the connection pool such that the front-end will *try* to re-use the same connection, if available - but I don't want to count on this.) Thanks, Mike |
| ||||
| "Mike" <mwmills1969@NOSPAMyahoo.com> wrote in message news:<LfidnTjWAt8KiWndRVn-hw@giganews.com>... > Thanks for the comments. But, do you agree that if high concurrency is a > concern, that we can't really allocate a connection to a single user, while > the user "thinks" about paging forward/backward? sure, i can see how this would be a concern. > About the temporary table - wouldn't that be on a per-connection basis, > though? Or could I access the same temporary table from *any* connection? > (Of course I realize that I could optimize the connection pool such that the > front-end will *try* to re-use the same connection, if available - but I > don't want to count on this.) no, you can't share it across connections. however, there are still quite a few other, similar options: create a permanent table instead. You'd have to figure out when to asyncronously drop it, it's slower to create, and you'd have to figure out what to name it, but it could work the same way as the temp table otherwise. Still, perhaps the first query retrieval time would be expensive. Not sure about what kind of insertion rate you're looking at, or what the model looks like (unique key?), but...have you considered using something like multiple MQTs with ids representing various sort orders? Would be impossible to keep perfectly up to date (update hourly?), but would be very fast on retrieval since you could easily rely on an index to grab a small number of rows at a time... |