This is a discussion on Finding the next best answer within the Informix forums, part of the Database Server Software category; --> Hello All, I have a problem where I need to find the 2nd and 3rd best answers as well ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello All, I have a problem where I need to find the 2nd and 3rd best answers as well as the best, and wrap them all up into a single result row. I'm having trouble getting to an IDS-compliant answer: (tables simplified to protect the sanity of the innocent) Table tab1 has two columns: an integer id (obj1), and a character description (name) Table link has three columns: an integer id (obj1) which corresponds to the column of the same name in tab1, an integer (obj2) which is just data, and an integer (pri) which specifies a priority of the link between the object in tab1 (obj1) and the data. tab1 is 1-many with link on obj1. To get the "highest" (or lowest, for a given definition of down...) priority data for each obj1 is easy: select tab1.name, sub1.obj1, sub1.pri from tab1 inner join table(multiset( select obj1, min(pri) from link group by obj1 )) as sub1 (obj1, pri) on tab1.obj1 = sub1.obj1 ....but to get the 2nd and third is proving surprisingly difficult. I cant use first or order by statements in my subqueries, which would make it fairly easy, so I tried the somewhat convoluted: select tab1.name, sub1.obj1, sub1.pri, sub2.pri from tab1 left outer join table(multiset( select obj1, min(pri) from link group by obj1 )) as sub1 (obj1, pri) on tab1.obj1 = sub1.obj1 left outer join table(multiset( select obj1, min(pri) from link inner join sub1 on link.obj1 = sub1.obj1 where sub1.pri != link.pri group by obj1 )) as sub2 (obj1, pri) on tab1.obj1 = sub2.obj1 ....thinking to find the minimum again while excluding the first result. But to exclude the result from the first subquery required me to make it a correlated subquery referring back to the pseudo-table of the first result (sub1) and (in addition to ringing efficiency danger bells) IDS doesn't seem to want to let me do that. Can anyone suggest a method to get the back-referrence to work, or - better really by far - just any clever way to get the 2nd and 3rd best answers all in the one row? Thanks for any advice, - rob. |
| |||
| I did the following and it seems to work well. I don't think the performance will be too awful but I don't know. Notice that I used real views which simplifies the logic to the point that even I can understand it. This doesn't handle ties. The terminology is from horse racing and is used somewhat incorrectly in that place means either first or second and show means either first, second, or third. create table tab1 ( obj1 serial, name varchar(10) ) ; create table link ( obj1 integer, obj2 integer, pri integer ) ; create view win_pri(obj1, win_pri) as select obj1, min(pri) from link group by obj1 ; create view place_pri(obj1, place_pri) as select l.obj1, min(pri) from link l, win_pri mp where pri > mp.win_pri group by l.obj1 ; create view show_pri(obj1, show_pri) as select l.obj1, min(pri) from link l, place_pri pp where pri > pp.place_pri group by l.obj1 ; insert into tab1 values (0,"test1"); insert into tab1 values (0,"test2"); insert into tab1 values (0,"test3"); insert into link values (1,85,0); insert into link values (1,85,1); insert into link values (1,85,2); insert into link values (1,85,3); insert into link values (1,85,4); insert into link values (2,86,0); insert into link values (2,86,1); select * from tab1; select * from link; select * from win_pri; select * from place_pri; select * from show_pri; select t.*, wp.win_pri, pp.place_pri, sp.show_pri from tab1 t, outer win_pri wp, outer place_pri pp, outer show_pri sp where t.obj1 = wp.obj1 and t.obj1 = pp.obj1 and t.obj1 = sp.obj1 ; |
| |||
| Dev wrote: > Hello All, > > I have a problem where I need to find the 2nd and 3rd best answers as > well as the best, and wrap them all up into a single result row. I'm > having trouble getting to an IDS-compliant answer: > (tables simplified to protect the sanity of the innocent) > > Table tab1 has two columns: an integer id (obj1), and a character > description (name) > Table link has three columns: an integer id (obj1) which corresponds to > the column of the same name in tab1, an integer (obj2) which is just > data, and an integer (pri) which specifies a priority of the link > between the object in tab1 (obj1) and the data. tab1 is 1-many with > link on obj1. > > To get the "highest" (or lowest, for a given definition of down...) > priority data for each obj1 is easy: > > select tab1.name, sub1.obj1, sub1.pri > from tab1 inner join table(multiset( > select obj1, min(pri) > from link > group by obj1 > )) as sub1 (obj1, pri) > on tab1.obj1 = sub1.obj1 This is the same as: SELECT tab1.name, tab1.obj1, MIN(link.pri) FROM tab1 INNER JOIN link ON tab1.obj1 = link.obj1 GROUP BY tab1.name, tab1.obj1; which is, to my mind, easier to read. > > ....but to get the 2nd and third is proving surprisingly difficult. I > cant use first or order by statements in my subqueries, which would > make it fairly easy, so I tried the somewhat convoluted: > > select tab1.name, sub1.obj1, sub1.pri, sub2.pri > from tab1 left outer join table(multiset( > select obj1, min(pri) > from link > group by obj1 > )) as sub1 (obj1, pri) > on tab1.obj1 = sub1.obj1 > left outer join table(multiset( > select obj1, min(pri) > from link inner join sub1 > on link.obj1 = sub1.obj1 > where sub1.pri != link.pri > group by obj1 > )) as sub2 (obj1, pri) > on tab1.obj1 = sub2.obj1 > > ....thinking to find the minimum again while excluding the first result. > But to exclude the result from the first subquery required me to make > it a correlated subquery referring back to the pseudo-table of the > first result (sub1) and (in addition to ringing efficiency danger > bells) IDS doesn't seem to want to let me do that. Can anyone suggest > a method to get the back-referrence to work, or - better really by far > - just any clever way to get the 2nd and 3rd best answers all in the > one row? Don't try to do the whole thing with inline-views, use a temporary table: SELECT obj1, MIN(pri) AS pri, 1 AS rank FROM link GROUP BY 1, 3 INTO TEMP tmp WITH NO LOG; INSERT INTO tmp SELECT link.obj1, MIN(link.pri), 2 FROM link INNER JOIN tmp ON link.obj1 = tmp.obj1 AND link.pri > tmp.pri AND tmp.rank = 1 GROUP BY 1, 3; INSERT INTO tmp SELECT link.obj1, MIN(link.pri), 3 FROM link INNER JOIN tmp ON link.obj1 = tmp.obj1 AND link.pri > tmp.pri AND tmp.rank = 2 GROUP BY 1, 3; SELECT tab1.name, tab1.obj1, x1.pri, x2.pri, x3.pri FROM tab1 INNER JOIN tmp AS x1 ON tab1.obj1 = x1.obj1 AND x1.rank = 1 LEFT JOIN tmp AS x2 ON x1.obj1 = x2.obj1 AND x2.rank = 2 LEFT JOIN tmp AS x3 ON x2.obj1 = x3.obj1 AND x3.rank = 3; Assuming that your not constrained to a single statement, of course. Depending upon the size of your data, you'll probably need a unique index on, and the proper statistics for, tmp(obj1, rank). -- rh |
| |||
| Bozon said: >the terminology is from horse racing and is used somewhat incorrectly in that place >means either first or second and show means either first, second, or third. I use place to mean second best only and show to mean thrid place only. Bozon said: insert into tab1 values (0,"test1"); insert into tab1 values (0,"test2"); insert into tab1 values (0,"test3"); insert into link values (1,85,0); insert into link values (1,85,1); insert into link values (1,85,2); insert into link values (1,85,3); insert into link values (1,85,4); insert into link values (2,86,0); insert into link values (2,86,1); select * from tab1; select * from link; select * from win_pri; select * from place_pri; select * from show_pri; End Bozon These inserts and select aren't needed to solve the problem of course they are just test data and selects to verify that I inserted the data correctly and that I wrote the final query correctly. I always include the whole thing so that the viewers at home can play along. Sorry, if this makes my post somewhat confusing. |
| |||
| > This is the same as: > SELECT tab1.name, tab1.obj1, MIN(link.pri) > FROM tab1 INNER JOIN link > ON tab1.obj1 = link.obj1 > GROUP BY tab1.name, tab1.obj1; > which is, to my mind, easier to read. You're right of course. In my zeal to simplify the problem I took out the fact there is additional information in the link table, and I want only the values of these fields from the matching row with the minimum priority. But yours is a much better answer to the question I actually asked... > Don't try to do the whole thing with inline-views, use a temporary table: > Assuming that your not constrained to a single statement, of course. Ah, yes. Actually, I am constrained to a single statement; forgot to mention that. Work with the same artificial horizons for too long and you forget they're there... or at least forget they're artificial. I've done a solution modelled on Bozons views but with the views inline; its VERY ugly (not Bozons fault; _I_ put them inline) - and I suspect very inefficient - but it does seem to work. Essentially, since I couldn't get the back reference to the previously calculated "win" value to work, I re-calculate "win" again in the process of caluclating "place", and then calculate "place" a second time in the process of calculating "show" (which, of course, requires calculating "win" yet a third time.) I'm frankly rather disgusted by this, but it probably won't be TOO slow - the link table is indexed on the appropriate field, and should have 4 or 5 priorities tops for each one. Still, if someone comes up with something a bit less of a blunt instrument I'd appreciate hearing about it. Thanks again for your help guys, - rob. |
| |||
| Dev wrote: > Ah, yes. Actually, I am constrained to a single statement; forgot to > mention that. Work with the same artificial horizons for too long and > you forget they're there... or at least forget they're artificial. > > I've done a solution modelled on Bozons views but with the views > inline; its VERY ugly (not Bozons fault; _I_ put them inline) - and I > suspect very inefficient - but it does seem to work. Essentially, > since I couldn't get the back reference to the previously calculated > "win" value to work, I re-calculate "win" again in the process of > caluclating "place", and then calculate "place" a second time in the > process of calculating "show" (which, of course, requires calculating > "win" yet a third time.) I'm frankly rather disgusted by this, but it > probably won't be TOO slow - the link table is indexed on the > appropriate field, and should have 4 or 5 priorities tops for each one. Don't forget that choosing the optimal query plan is NP-Hard. Informix's optimiser does a better done that it often gets credit for, but you've got to give it a fighting chance. Two words: UPDATE STATISTICS. > Still, if someone comes up with something a bit less of a blunt > instrument I'd appreciate hearing about it. Okay, how about on the server you: CREATE ROW TYPE tab1_pris_row ( obj1 INTEGER, name CHAR(20), win INTEGER, place INTEGER, show INTEGER ); CREATE FUNCTION build_tab1_pri() RETURNS MULTISET(tab1_pris_row NOT NULL); DEFINE l_bag MULTISET(tab1_pris_row NOT NULL); DEFINE l_row tab1_pris_row; DEFINE l_obj1 INTEGER; DEFINE l_name CHAR(20); DEFINE l_pri INTEGER; DEFINE l_count INTEGER; DEFINE l_null INTEGER; LET l_null = NULL; FOREACH SELECT obj1, name INTO l_obj1, l_name FROM tab1 LET l_count = 0; LET l_row = ROW(l_obj1, l_name, l_null, l_null, l_null)::tab1_pris_row; FOREACH SELECT FIRST 3 DISTINCT pri INTO l_pri FROM link WHERE obj1 = l_obj1 ORDER BY pri ASC LET l_count = l_count + 1; IF l_count == 1 THEN LET l_row.win = l_pri; END IF IF l_count == 2 THEN LET l_row.place = l_pri; END IF IF l_count == 3 THEN LET l_row.show = l_pri; EXIT FOREACH; END IF END FOREACH IF l_count == 0 -- There are no links for this obj1 at all THEN CONTINUE FOREACH; -- so ignore it END IF INSERT INTO TABLE(l_bag) VALUES(l_row); END FOREACH RETURN l_bag; END FUNCTION; .... then, in your application you can just: SELECT * FROM TABLE(build_tab1_pri()); -- rh |
| ||||
| Dev said >VERY ugly (not Bozons fault; _I_ put them inline) Yes, I write very beautiful SQL. :-) Here is the solution, Don't put them inline. Views exist forever (well until dropped) create them once permantly and then you have the one simple statement using the views over and over. I not sure why you want to do them inline. Views were made for this simplification of SQL and IMHO aren't used enough by smart people because they can understand complicated SQL. I say use the brain cells for something else and write simple SQL. Why do people not like views? I am going to start a view fan club, send requests to join to me. I'll send you a T-Shirt for a small membership fee. ;-) >- and I suspect very inefficient - but it does seem to work. Only if you are having to recalculate Win, and Place because of link problems. I am not sure if they are any less effecient than anything else. Even if you sort and take the first 3 you have to sort all of the data to find the first 3. If speed of this is really important you could have an index: create index link_1x on link(obj1, pri desc) ; This should give you the min very fast for each obj. Of course, I don't feel like creating millions of rows to prove this. The other good thing about views is that they will use, certainly, the underlying indexes on a table if they exist where as with temp tables you have to create new indexes. Don't get me wrong I love temp tables too. |