This is a discussion on Need Efficent Query within the MySQL forums, part of the Database Server Software category; --> I'm hoping some SQL gurus read this and help me optimize my table/ query. The table is shown below: ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I'm hoping some SQL gurus read this and help me optimize my table/ query. The table is shown below: CREATE TABLE IF NOT EXISTS PACKAGE_TABLE ( ID BIGINT UNSIGNED PRIMARY KEY NOT NULL AUTO_INCREMENT, TAG_ID BIGINT UNSIGNED NOT NULL, LAST_UPDATE TIMESTAMP, JOB_NAME MEDIUMTEXT, STATE TEXT, ERROR_CODES INT UNSIGNED, OPERATOR MEDIUMTEXT, TAG_FIRMWARE TINYTEXT, |
| |||
| On Nov 12, 8:52 pm, Dean.Brot...@gmail.com wrote: > I'm hoping some SQL gurus read this and help me optimize my table/ > query. > > The table is shown below: > > CREATE TABLE IF NOT EXISTS PACKAGE_TABLE ( > ID BIGINT UNSIGNED PRIMARY KEY NOT NULL > AUTO_INCREMENT, > TAG_ID BIGINT UNSIGNED NOT NULL, > LAST_UPDATE TIMESTAMP, > JOB_NAME MEDIUMTEXT, > STATE TEXT, > ERROR_CODES INT UNSIGNED, > OPERATOR MEDIUMTEXT, > TAG_FIRMWARE TINYTEXT, > . > . > . > PLUS 5 OTHER TEXT FIELDS NOT IMPORTANT FOR THIS DICSUSSION > ) > > In the table I will be storing multiple records keyed by the same > TAG_ID. Essientlly every iteraction of each tag is stored and thus I > can recover a tag's history. > > For an important report I want to query for the last update for each > TAG_ID. I have come up with the following: > > SELECT * FROM PACKAGE_TABLE t1 WHERE JOB_NAME = 'Some Job' AND > LAST_UPDATE = (SELECT MAX(t2.LAST_UPDATE) FROM PACKAGE_TABLE t2 where > t1.TAG_ID = t2.TAG_ID) > > This query works but on large tables (10000+ tag ids) the query really > slows. Does anyone know of a more efficient way to format the table/ > query for increased speed. No guarantees, but try writing it as a join instead: select * from package_table p1 join (select p2.tag_id, max(p2.last_update) last_update from package_table p2 group by p2.tag_id) x on (p1.tag_id = x.tag_id and p1.last_update = x.last_update) where job_name = 'Some Job' |
| |||
| On 13 Nov, 01:52, Dean.Brot...@gmail.com wrote: > I'm hoping some SQL gurus read this and help me optimize my table/ > query. > > The table is shown below: > > CREATE TABLE IF NOT EXISTS PACKAGE_TABLE ( > ID BIGINT UNSIGNED PRIMARY KEY NOT NULL > AUTO_INCREMENT, > TAG_ID BIGINT UNSIGNED NOT NULL, > LAST_UPDATE TIMESTAMP, > JOB_NAME MEDIUMTEXT, > STATE TEXT, > ERROR_CODES INT UNSIGNED, > OPERATOR MEDIUMTEXT, > TAG_FIRMWARE TINYTEXT, > . > . > . > PLUS 5 OTHER TEXT FIELDS NOT IMPORTANT FOR THIS DICSUSSION > ) > > In the table I will be storing multiple records keyed by the same > TAG_ID. Essientlly every iteraction of each tag is stored and thus I > can recover a tag's history. > > For an important report I want to query for the last update for each > TAG_ID. I have come up with the following: > > SELECT * FROM PACKAGE_TABLE t1 WHERE JOB_NAME = 'Some Job' AND > LAST_UPDATE = (SELECT MAX(t2.LAST_UPDATE) FROM PACKAGE_TABLE t2 where > t1.TAG_ID = t2.TAG_ID) > > This query works but on large tables (10000+ tag ids) the query really > slows. Does anyone know of a more efficient way to format the table/ > query for increased speed. You need the strawberry query which is mentioned many times in this group. This avoids the use of a subquery completely and is the most eficient way of doing this sort of query. Look at the last query on this page (before the user comments): http://dev.mysql.com/doc/refman/5.0/...group-row.html |
| |||
| "Captain Paralytic" <paul_lautman@yahoo.com> wrote in message news:1194947263.585654.202740@v65g2000hsc.googlegr oups.com... > On 13 Nov, 01:52, Dean.Brot...@gmail.com wrote: >> I'm hoping some SQL gurus read this and help me optimize my table/ >> query. >> >> The table is shown below: >> >> CREATE TABLE IF NOT EXISTS PACKAGE_TABLE ( >> ID BIGINT UNSIGNED PRIMARY KEY NOT NULL >> AUTO_INCREMENT, >> TAG_ID BIGINT UNSIGNED NOT NULL, >> LAST_UPDATE TIMESTAMP, >> JOB_NAME MEDIUMTEXT, >> STATE TEXT, >> ERROR_CODES INT UNSIGNED, >> OPERATOR MEDIUMTEXT, >> TAG_FIRMWARE TINYTEXT, >> . >> . >> . >> PLUS 5 OTHER TEXT FIELDS NOT IMPORTANT FOR THIS DICSUSSION >> ) >> >> In the table I will be storing multiple records keyed by the same >> TAG_ID. Essientlly every iteraction of each tag is stored and thus I >> can recover a tag's history. >> >> For an important report I want to query for the last update for each >> TAG_ID. I have come up with the following: >> >> SELECT * FROM PACKAGE_TABLE t1 WHERE JOB_NAME = 'Some Job' AND >> LAST_UPDATE = (SELECT MAX(t2.LAST_UPDATE) FROM PACKAGE_TABLE t2 where >> t1.TAG_ID = t2.TAG_ID) >> >> This query works but on large tables (10000+ tag ids) the query really >> slows. Does anyone know of a more efficient way to format the table/ >> query for increased speed. > > You need the strawberry query which is mentioned many times in this > group. This avoids the use of a subquery completely and is the most > eficient way of doing this sort of query. > > Look at the last query on this page (before the user comments): > http://dev.mysql.com/doc/refman/5.0/...group-row.html I have to take the web page's word for it when it says that's an efficient solution for MySQL, but that is the kind of optimization that I'd expect a reasonable 21st century DBMS to find all by itself based on the original query, which was a perfectly correct specification of the required result. Roy |
| |||
| On 13 Nov, 10:21, "Roy Hann" <specia...@processed.almost.meat> wrote: > "Captain Paralytic" <paul_laut...@yahoo.com> wrote in message > > news:1194947263.585654.202740@v65g2000hsc.googlegr oups.com... > > > > > > > On 13 Nov, 01:52, Dean.Brot...@gmail.com wrote: > >> I'm hoping some SQL gurus read this and help me optimize my table/ > >> query. > > >> The table is shown below: > > >> CREATE TABLE IF NOT EXISTS PACKAGE_TABLE ( > >> ID BIGINT UNSIGNED PRIMARY KEY NOT NULL > >> AUTO_INCREMENT, > >> TAG_ID BIGINT UNSIGNED NOT NULL, > >> LAST_UPDATE TIMESTAMP, > >> JOB_NAME MEDIUMTEXT, > >> STATE TEXT, > >> ERROR_CODES INT UNSIGNED, > >> OPERATOR MEDIUMTEXT, > >> TAG_FIRMWARE TINYTEXT, > >> . > >> . > >> . > >> PLUS 5 OTHER TEXT FIELDS NOT IMPORTANT FOR THIS DICSUSSION > >> ) > > >> In the table I will be storing multiple records keyed by the same > >> TAG_ID. Essientlly every iteraction of each tag is stored and thus I > >> can recover a tag's history. > > >> For an important report I want to query for the last update for each > >> TAG_ID. I have come up with the following: > > >> SELECT * FROM PACKAGE_TABLE t1 WHERE JOB_NAME = 'Some Job' AND > >> LAST_UPDATE = (SELECT MAX(t2.LAST_UPDATE) FROM PACKAGE_TABLE t2 where > >> t1.TAG_ID = t2.TAG_ID) > > >> This query works but on large tables (10000+ tag ids) the query really > >> slows. Does anyone know of a more efficient way to format the table/ > >> query for increased speed. > > > You need the strawberry query which is mentioned many times in this > > group. This avoids the use of a subquery completely and is the most > > eficient way of doing this sort of query. > > > Look at the last query on this page (before the user comments): > >http://dev.mysql.com/doc/refman/5.0/...column-group-r... > > I have to take the web page's word for it when it says that's an efficient > solution for MySQL, but that is the kind of optimization that I'd expect a > reasonable 21st century DBMS to find all by itself based on the original > query, which was a perfectly correct specification of the required result. > > Roy- Hide quoted text - > > - Show quoted text - Then you're expecting too much. SQL programmers are assumed to possess a certain level of skill I guess. The structures of the query are wildly different. How is an optimizer to know whether you have a specific reason for constructing a query in a particular way. |
| |||
| "Captain Paralytic" <paul_lautman@yahoo.com> wrote in message news:1194951208.239544.57020@k79g2000hse.googlegro ups.com... > > On 13 Nov, 10:21, "Roy Hann" <specia...@processed.almost.meat> wrote: >> I have to take the web page's word for it when it says that's an >> efficient >> solution for MySQL, but that is the kind of optimization that I'd expect >> a >> reasonable 21st century DBMS to find all by itself based on the original >> query, which was a perfectly correct specification of the required >> result. > > Then you're expecting too much. Not at all. There are SQL DBMS products that do exactly that, and have done for years. > SQL programmers are assumed to possess > a certain level of skill I guess. Now who's expecting too much? I expect DBMSs to be designed and developed by the very best and brightest developers. I think that is a very reasonable expectation. I also expect that most working programmers have little or no interest in SQL, regard it as a distraction, and would very much prefer the machine to do the best job possible with a correct description of the required result. > The structures of the query are > wildly different. The form of the syntax is wildly different. The result is identical (or you couldn't recommend it as a better solution). > How is an optimizer to know whether you have a > specific reason for constructing a query in a particular way. Since it is an *optimizer* all it needs to know is that I have requested certain results. I don't need to know what machinations it goes through to get them. In fact I positively want NOT to know. I want to be confident that if some of those very bright and able DBMS developers add a new bit of cleverness to their server in future, my code will automatically benefit from it without me needing to change anything. Roy |
| |||
| > Now who's expecting too much? I expect DBMSs to be designed and developed > by the very best and brightest developers. I think that is a very > reasonable expectation. I also expect that most working programmers have > little or no interest in SQL, regard it as a distraction, and would very > much prefer the machine to do the best job possible with a correct > description of the required result. That a program is written by the brightest developers does not guarantee that willfully ignorant people can generate optimized results. On the contrary, the fact that someone is willfully ignorant guarantees lousy results. This is why MS-Access, for instance, has a name of being slow, while it can be one of the fastest database around if you program it correctly. If you don't treat server databases as server-based and filesystem databases as locally available, you can never achieve optimal results. As most databases should respond to each query you send, it is entirely up to the programmer to determine the query strategy (lazy, greedy, use of stored procedures, or whatever mix you can program). By the way, you expect wrong. I have a responsibility to use the database as a part of the systems I build. Database traffic is not a distraction. It can ruin my entire system if it does not work as it should. I, on the other hand, expect a programmer who does not give a dime about his system to stay quiet about optimizations. Regards, -- Willem Bogaerts Application smith Kratz B.V. http://www.kratz.nl/ |
| |||
| "Captain Paralytic" <paul_lautman@yahoo.com> wrote in message news:1194956220.063305.125300@d55g2000hsg.googlegr oups.com... > On 13 Nov, 12:07, "Roy Hann" <specia...@processed.almost.meat> wrote: >> I also expect that most working programmers have >> little or no interest in SQL, regard it as a distraction, > > That's why there are SQL programmers! I take it you mean *expert* SQL programmers? Sure, there are such people. But I suspect there are very many more programmers writing Java and Cobol and PHP and so on, who are compelled to use SQL but regard it as something other than a core skill, and know just enough to get by. They don't want to take the time to become highly proficient with SQL. They dislike it; they probably feel it gets in the way more than it helps, and if they have to repeatedly reformulate a query looking for that one special way of expressing it so it works tolerably well, then they would (rightly) think that's just voo-doo and hate SQL even more. SQL was intended to be a goal-oriented language. The idea was (explicitly) to allow the optimizer to discern the essential meaning of the query--no matter how it was expressed--and come up with an efficient way of producing the requested result. That's how it is supposed to work, and lots of products achieve that to a remarkable degree. Flipping it around, if we had a perfect SQL engine, no one would ever ask to add a feature so they could write an equivalent query that would be slower! :-) Expert SQL programmers are actually a bad thing. We shouldn't want to want them. Roy |
| ||||
| On 13 Nov, 13:34, "Roy Hann" <specia...@processed.almost.meat> wrote: > "Captain Paralytic" <paul_laut...@yahoo.com> wrote in message > > news:1194956220.063305.125300@d55g2000hsg.googlegr oups.com... > > > On 13 Nov, 12:07, "Roy Hann" <specia...@processed.almost.meat> wrote: > >> I also expect that most working programmers have > >> little or no interest in SQL, regard it as a distraction, > > > That's why there are SQL programmers! > > I take it you mean *expert* SQL programmers? > > Sure, there are such people. But I suspect there are very many more > programmers writing Java and Cobol and PHP and so on, who are compelled to > use SQL but regard it as something other than a core skill, and know just > enough to get by. They don't want to take the time to become highly > proficient with SQL. They dislike it; they probably feel it gets in the way > more than it helps, and if they have to repeatedly reformulate a query > looking for that one special way of expressing it so it works tolerably > well, then they would (rightly) think that's just voo-doo and hate SQL even > more. > > SQL was intended to be a goal-oriented language. The idea was (explicitly) > to allow the optimizer to discern the essential meaning of the query--no > matter how it was expressed--and come up with an efficient way of producing > the requested result. That's how it is supposed to work, and lots of > products achieve that to a remarkable degree. Actually there's another aspect that hasn't been mentioned yet and that is the database design and the choice of indexes. My background to using databases is ISAM on System 36 and CICS/VSAM. When I design database tables (structure, indexes, ...), I'm always thinking of how it will be used and what the data will look like. I know that if I design it wrong, no optimzer in the world will ever be able to access the data efficiently. It seems to me that, programmers who have no appreciation of how to craft an efficient query, are unlikely to have given sufficient thought to this important aspect. I don't see a great deal of difference between helping the optimizer by good design of database and helping it with good design of query. |
| Thread Tools | |
| Display Modes | |
|
|