Unix Technical Forum

Finding the next best answer

This is a discussion on Finding the next best answer within the Informix forums, part of the Database Server Software category; --> Hello All, I have a problem where I need to find the 2nd and 3rd best answers as well ...


Go Back   Unix Technical Forum > Database Server Software > Informix

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-20-2008, 11:22 AM
Dev
 
Posts: n/a
Default Finding the next best answer

Hello All,

I have a problem where I need to find the 2nd and 3rd best answers as
well as the best, and wrap them all up into a single result row. I'm
having trouble getting to an IDS-compliant answer:
(tables simplified to protect the sanity of the innocent)

Table tab1 has two columns: an integer id (obj1), and a character
description (name)
Table link has three columns: an integer id (obj1) which corresponds to
the column of the same name in tab1, an integer (obj2) which is just
data, and an integer (pri) which specifies a priority of the link
between the object in tab1 (obj1) and the data. tab1 is 1-many with
link on obj1.

To get the "highest" (or lowest, for a given definition of down...)
priority data for each obj1 is easy:

select tab1.name, sub1.obj1, sub1.pri
from tab1 inner join table(multiset(
select obj1, min(pri)
from link
group by obj1
)) as sub1 (obj1, pri)
on tab1.obj1 = sub1.obj1

....but to get the 2nd and third is proving surprisingly difficult. I
cant use first or order by statements in my subqueries, which would
make it fairly easy, so I tried the somewhat convoluted:

select tab1.name, sub1.obj1, sub1.pri, sub2.pri
from tab1 left outer join table(multiset(
select obj1, min(pri)
from link
group by obj1
)) as sub1 (obj1, pri)
on tab1.obj1 = sub1.obj1
left outer join table(multiset(
select obj1, min(pri)
from link inner join sub1
on link.obj1 = sub1.obj1
where sub1.pri != link.pri
group by obj1
)) as sub2 (obj1, pri)
on tab1.obj1 = sub2.obj1

....thinking to find the minimum again while excluding the first result.
But to exclude the result from the first subquery required me to make
it a correlated subquery referring back to the pseudo-table of the
first result (sub1) and (in addition to ringing efficiency danger
bells) IDS doesn't seem to want to let me do that. Can anyone suggest
a method to get the back-referrence to work, or - better really by far
- just any clever way to get the 2nd and 3rd best answers all in the
one row?

Thanks for any advice,

- rob.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-20-2008, 11:22 AM
bozon
 
Posts: n/a
Default Re: Finding the next best answer

I did the following and it seems to work well. I don't think the
performance will be too awful but I don't know. Notice that I used real
views which simplifies the logic to the point that even I can
understand it. This doesn't handle ties. The terminology is from horse
racing and is used somewhat incorrectly in that place means either
first or second and show means either first, second, or third.

create table tab1 (
obj1 serial,
name varchar(10)
) ;

create table link (
obj1 integer,
obj2 integer,
pri integer
) ;


create view win_pri(obj1, win_pri) as
select
obj1,
min(pri)
from
link
group by obj1
;

create view place_pri(obj1, place_pri) as
select
l.obj1,
min(pri)
from
link l,
win_pri mp
where
pri > mp.win_pri
group by
l.obj1
;

create view show_pri(obj1, show_pri) as
select
l.obj1,
min(pri)
from
link l,
place_pri pp
where
pri > pp.place_pri
group by
l.obj1
;

insert into tab1 values (0,"test1");
insert into tab1 values (0,"test2");
insert into tab1 values (0,"test3");

insert into link values (1,85,0);
insert into link values (1,85,1);
insert into link values (1,85,2);
insert into link values (1,85,3);
insert into link values (1,85,4);

insert into link values (2,86,0);
insert into link values (2,86,1);


select * from tab1;
select * from link;
select * from win_pri;
select * from place_pri;
select * from show_pri;


select
t.*,
wp.win_pri,
pp.place_pri,
sp.show_pri
from
tab1 t,
outer win_pri wp,
outer place_pri pp,
outer show_pri sp
where
t.obj1 = wp.obj1 and
t.obj1 = pp.obj1 and
t.obj1 = sp.obj1
;

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-20-2008, 11:22 AM
Richard Harnden
 
Posts: n/a
Default Re: Finding the next best answer

Dev wrote:
> Hello All,
>
> I have a problem where I need to find the 2nd and 3rd best answers as
> well as the best, and wrap them all up into a single result row. I'm
> having trouble getting to an IDS-compliant answer:
> (tables simplified to protect the sanity of the innocent)
>
> Table tab1 has two columns: an integer id (obj1), and a character
> description (name)
> Table link has three columns: an integer id (obj1) which corresponds to
> the column of the same name in tab1, an integer (obj2) which is just
> data, and an integer (pri) which specifies a priority of the link
> between the object in tab1 (obj1) and the data. tab1 is 1-many with
> link on obj1.
>
> To get the "highest" (or lowest, for a given definition of down...)
> priority data for each obj1 is easy:
>
> select tab1.name, sub1.obj1, sub1.pri
> from tab1 inner join table(multiset(
> select obj1, min(pri)
> from link
> group by obj1
> )) as sub1 (obj1, pri)
> on tab1.obj1 = sub1.obj1


This is the same as:

SELECT tab1.name, tab1.obj1, MIN(link.pri)
FROM tab1 INNER JOIN link
ON tab1.obj1 = link.obj1
GROUP BY tab1.name, tab1.obj1;

which is, to my mind, easier to read.

>
> ....but to get the 2nd and third is proving surprisingly difficult. I
> cant use first or order by statements in my subqueries, which would
> make it fairly easy, so I tried the somewhat convoluted:
>
> select tab1.name, sub1.obj1, sub1.pri, sub2.pri
> from tab1 left outer join table(multiset(
> select obj1, min(pri)
> from link
> group by obj1
> )) as sub1 (obj1, pri)
> on tab1.obj1 = sub1.obj1
> left outer join table(multiset(
> select obj1, min(pri)
> from link inner join sub1
> on link.obj1 = sub1.obj1
> where sub1.pri != link.pri
> group by obj1
> )) as sub2 (obj1, pri)
> on tab1.obj1 = sub2.obj1
>
> ....thinking to find the minimum again while excluding the first result.
> But to exclude the result from the first subquery required me to make
> it a correlated subquery referring back to the pseudo-table of the
> first result (sub1) and (in addition to ringing efficiency danger
> bells) IDS doesn't seem to want to let me do that. Can anyone suggest
> a method to get the back-referrence to work, or - better really by far
> - just any clever way to get the 2nd and 3rd best answers all in the
> one row?


Don't try to do the whole thing with inline-views, use a temporary table:

SELECT obj1, MIN(pri) AS pri, 1 AS rank
FROM link
GROUP BY 1, 3
INTO TEMP tmp WITH NO LOG;

INSERT INTO tmp
SELECT link.obj1, MIN(link.pri), 2
FROM link INNER JOIN tmp
ON link.obj1 = tmp.obj1
AND link.pri > tmp.pri
AND tmp.rank = 1
GROUP BY 1, 3;

INSERT INTO tmp
SELECT link.obj1, MIN(link.pri), 3
FROM link INNER JOIN tmp
ON link.obj1 = tmp.obj1
AND link.pri > tmp.pri
AND tmp.rank = 2
GROUP BY 1, 3;

SELECT tab1.name, tab1.obj1, x1.pri, x2.pri, x3.pri
FROM tab1
INNER JOIN tmp AS x1
ON tab1.obj1 = x1.obj1
AND x1.rank = 1
LEFT JOIN tmp AS x2
ON x1.obj1 = x2.obj1
AND x2.rank = 2
LEFT JOIN tmp AS x3
ON x2.obj1 = x3.obj1
AND x3.rank = 3;

Assuming that your not constrained to a single statement, of course.

Depending upon the size of your data, you'll probably need a unique
index on, and the proper statistics for, tmp(obj1, rank).

--
rh
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-20-2008, 11:22 AM
bozon
 
Posts: n/a
Default Re: Finding the next best answer

Bozon said:
>the terminology is from horse racing and is used somewhat incorrectly in that place >means either first or second and show means either first, second, or third.

I use place to mean second best only and show to mean thrid place only.

Bozon said:
insert into tab1 values (0,"test1");
insert into tab1 values (0,"test2");
insert into tab1 values (0,"test3");

insert into link values (1,85,0);
insert into link values (1,85,1);
insert into link values (1,85,2);
insert into link values (1,85,3);
insert into link values (1,85,4);

insert into link values (2,86,0);
insert into link values (2,86,1);

select * from tab1;
select * from link;
select * from win_pri;
select * from place_pri;
select * from show_pri;
End Bozon

These inserts and select aren't needed to solve the problem of course
they are just test data and selects to verify that I inserted the data
correctly and that I wrote the final query correctly. I always include
the whole thing so that the viewers at home can play along.

Sorry, if this makes my post somewhat confusing.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-20-2008, 11:23 AM
Dev
 
Posts: n/a
Default Re: Finding the next best answer

> This is the same as:

> SELECT tab1.name, tab1.obj1, MIN(link.pri)
> FROM tab1 INNER JOIN link
> ON tab1.obj1 = link.obj1
> GROUP BY tab1.name, tab1.obj1;


> which is, to my mind, easier to read.


You're right of course. In my zeal to simplify the problem I took out
the fact there is additional information in the link table, and I want
only the values of these fields from the matching row with the minimum
priority. But yours is a much better answer to the question I actually
asked...

> Don't try to do the whole thing with inline-views, use a temporary table:
> Assuming that your not constrained to a single statement, of course.


Ah, yes. Actually, I am constrained to a single statement; forgot to
mention that. Work with the same artificial horizons for too long and
you forget they're there... or at least forget they're artificial.

I've done a solution modelled on Bozons views but with the views
inline; its VERY ugly (not Bozons fault; _I_ put them inline) - and I
suspect very inefficient - but it does seem to work. Essentially,
since I couldn't get the back reference to the previously calculated
"win" value to work, I re-calculate "win" again in the process of
caluclating "place", and then calculate "place" a second time in the
process of calculating "show" (which, of course, requires calculating
"win" yet a third time.) I'm frankly rather disgusted by this, but it
probably won't be TOO slow - the link table is indexed on the
appropriate field, and should have 4 or 5 priorities tops for each one.
Still, if someone comes up with something a bit less of a blunt
instrument I'd appreciate hearing about it.

Thanks again for your help guys,

- rob.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-20-2008, 11:23 AM
Richard Harnden
 
Posts: n/a
Default Re: Finding the next best answer

Dev wrote:
> Ah, yes. Actually, I am constrained to a single statement; forgot to
> mention that. Work with the same artificial horizons for too long and
> you forget they're there... or at least forget they're artificial.
>
> I've done a solution modelled on Bozons views but with the views
> inline; its VERY ugly (not Bozons fault; _I_ put them inline) - and I
> suspect very inefficient - but it does seem to work. Essentially,
> since I couldn't get the back reference to the previously calculated
> "win" value to work, I re-calculate "win" again in the process of
> caluclating "place", and then calculate "place" a second time in the
> process of calculating "show" (which, of course, requires calculating
> "win" yet a third time.) I'm frankly rather disgusted by this, but it
> probably won't be TOO slow - the link table is indexed on the
> appropriate field, and should have 4 or 5 priorities tops for each one.


Don't forget that choosing the optimal query plan is NP-Hard.
Informix's optimiser does a better done that it often gets credit for,
but you've got to give it a fighting chance. Two words: UPDATE STATISTICS.

> Still, if someone comes up with something a bit less of a blunt
> instrument I'd appreciate hearing about it.


Okay, how about on the server you:

CREATE ROW TYPE tab1_pris_row
(
obj1 INTEGER,
name CHAR(20),
win INTEGER,
place INTEGER,
show INTEGER
);

CREATE FUNCTION build_tab1_pri()
RETURNS MULTISET(tab1_pris_row NOT NULL);

DEFINE l_bag MULTISET(tab1_pris_row NOT NULL);
DEFINE l_row tab1_pris_row;
DEFINE l_obj1 INTEGER;
DEFINE l_name CHAR(20);
DEFINE l_pri INTEGER;
DEFINE l_count INTEGER;
DEFINE l_null INTEGER;

LET l_null = NULL;

FOREACH
SELECT obj1, name
INTO l_obj1, l_name
FROM tab1

LET l_count = 0;

LET l_row
= ROW(l_obj1, l_name, l_null, l_null, l_null)::tab1_pris_row;

FOREACH
SELECT FIRST 3 DISTINCT pri
INTO l_pri
FROM link
WHERE obj1 = l_obj1
ORDER BY pri ASC

LET l_count = l_count + 1;

IF l_count == 1
THEN
LET l_row.win = l_pri;
END IF

IF l_count == 2
THEN
LET l_row.place = l_pri;
END IF

IF l_count == 3
THEN
LET l_row.show = l_pri;
EXIT FOREACH;
END IF
END FOREACH

IF l_count == 0 -- There are no links for this obj1 at all
THEN
CONTINUE FOREACH; -- so ignore it
END IF

INSERT INTO TABLE(l_bag) VALUES(l_row);
END FOREACH

RETURN l_bag;
END FUNCTION;

.... then, in your application you can just:

SELECT *
FROM TABLE(build_tab1_pri());

--
rh



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-20-2008, 11:24 AM
bozon
 
Posts: n/a
Default Re: Finding the next best answer

Dev said
>VERY ugly (not Bozons fault; _I_ put them inline)


Yes, I write very beautiful SQL. :-)

Here is the solution, Don't put them inline. Views exist forever (well
until dropped) create them once permantly and then you have the one
simple statement using the views over and over.

I not sure why you want to do them inline. Views were made for this
simplification of SQL and IMHO aren't used enough by smart people
because they can understand complicated SQL. I say use the brain cells
for something else and write simple SQL. Why do people not like views?
I am going to start a view fan club, send requests to join to me. I'll
send you a T-Shirt for a small membership fee. ;-)

>- and I suspect very inefficient - but it does seem to work.

Only if you are having to recalculate Win, and Place because of link
problems. I am not sure if they are any less effecient than anything
else. Even if you sort and take the first 3 you have to sort all of the
data to find the first 3. If speed of this is really important you
could have an index:

create index link_1x on link(obj1, pri desc) ;

This should give you the min very fast for each obj. Of course, I don't
feel like creating millions of rows to prove this. The other good thing
about views is that they will use, certainly, the underlying indexes on
a table if they exist where as with temp tables you have to create new
indexes. Don't get me wrong I love temp tables too.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 09:23 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com