Unix Technical Forum

MySQL search with result scoring / relevance

This is a discussion on MySQL search with result scoring / relevance within the MySQL forums, part of the Database Server Software category; --> Hi, Suppose i have the following sql table: my_table (col_1, col_2, col_3, col_4) and the folowing user input for ...


Go Back   Unix Technical Forum > Database Server Software > MySQL

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-28-2008, 10:29 AM
mareeus@gmail.com
 
Posts: n/a
Default MySQL search with result scoring / relevance

Hi,

Suppose i have the following sql table:
my_table (col_1, col_2, col_3, col_4)

and the folowing user input for the search:
col_1:value_1
col_2:value_2
value_3

The above input means the folowing: search col_1 for value_1, col_2
for value_2 and value_3 in all columns.

Besides this i need some scoring (this is used for sorting the results
by their relevance).
A match in col_1 will be 4 points, col_2 will be 3 points, col_3 will
be 2 poits and the last one 1 point.

All the results will be sorted by computing this score.
For example: matching in col_1 for value_1 (+4), matching in col_2
(+3) and col_4 (+1) for value_3 will mean a +8 score for a row.

I need some ideas and if it is possible some web links that might help
me or even code.
I'm also open to any suggestions that can offer a solution to my
problem.

Regards,
Marius.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-28-2008, 10:29 AM
AlterEgo
 
Posts: n/a
Default Re: MySQL search with result scoring / relevance

Marius,

I think something like this is what you want:

DROP TABLE IF EXISTS `test`.`test`;
CREATE TABLE `test`.`test` (
`TestID` int(11) NOT NULL auto_increment,
`Col1` varchar(255) default NULL,
`Col2` varchar(255) default NULL,
`Col3` varchar(255) default NULL,
`Col4` varchar(255) default NULL,
PRIMARY KEY (`TestID`)
) ENGINE=InnoDB ;

insert Test (col1, col2, col3, col4) select 'mystring', '', '', '';
insert Test (col1, col2, col3, col4) select '', 'mystring', '', '';
insert Test (col1, col2, col3, col4) select '', '', 'mystring', '';
insert Test (col1, col2, col3, col4) select '', '', '', 'mystring';
insert Test (col1, col2, col3, col4) select 'mystring', 'mystring',
'mystring', 'mystring';

select
Col1
, Col2
, Col3
, Col4
, case when ifnull(Col1,'') = 'mystring' then 4 else 0 end /* now lets
add them all up */
+ case when ifnull(Col2,'') = 'mystring' then 3 else 0 end
+ case when ifnull(Col3,'') = 'mystring' then 2 else 0 end
+ case when ifnull(Col4,'') = 'mystring' then 1 else 0 end
MyRank
from Test

-- Bill


<mareeus@gmail.com> wrote in message
news:1174945884.429520.117870@n59g2000hsh.googlegr oups.com...
> Hi,
>
> Suppose i have the following sql table:
> my_table (col_1, col_2, col_3, col_4)
>
> and the folowing user input for the search:
> col_1:value_1
> col_2:value_2
> value_3
>
> The above input means the folowing: search col_1 for value_1, col_2
> for value_2 and value_3 in all columns.
>
> Besides this i need some scoring (this is used for sorting the results
> by their relevance).
> A match in col_1 will be 4 points, col_2 will be 3 points, col_3 will
> be 2 poits and the last one 1 point.
>
> All the results will be sorted by computing this score.
> For example: matching in col_1 for value_1 (+4), matching in col_2
> (+3) and col_4 (+1) for value_3 will mean a +8 score for a row.
>
> I need some ideas and if it is possible some web links that might help
> me or even code.
> I'm also open to any suggestions that can offer a solution to my
> problem.
>
> Regards,
> Marius.
>



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-28-2008, 10:29 AM
mareeus@gmail.com
 
Posts: n/a
Default Re: MySQL search with result scoring / relevance

Thanks a lot Bill.

Your example works very fine.

select
Col1
, Col2
, Col3
, Col4
, case when ifnull(Col1,'') = 'mystring' then 4 else 0 end /* now
lets
add them all up */
+ case when ifnull(Col2,'') = 'mystring' then 3 else 0 end
+ case when ifnull(Col3,'') = 'mystring' then 2 else 0 end
+ case when ifnull(Col4,'') = 'mystring' then 1 else 0 end
MyRank
from Test


Now i have another ouestion. Suppose the folowing user input:
col_1:value_1 or (col_2:value_2 and col_3:value_3) or col_4:value_4

I need to adapt your example to work with "()", "AND", "OR", and in
the case of one condition not meeting the boolean expression to
exclude that row from my selection.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 02-28-2008, 10:29 AM
mareeus@gmail.com
 
Posts: n/a
Default Re: MySQL search with result scoring / relevance

I've solved the 'or' and 'and' problem but now I'm facing a new one (a
shameful syntax error).
This is my query:

select *
, case when `col1` LIKE '%val1%' then 1 else 0 end Token1
, case when `col2` LIKE '%val2%' then 2 else 0 end Token2
, case when Token1 * Token2 > 0 then Token1 * Token2 else 0 end score
from `table` HAVING score > 0 order by score DESC

I get the following error:
#1054 - Unknown column 'Token0' in 'field list'

How can i make this work keeping Token1 and Token2 and using them to
compute the score and order results by score

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 02-28-2008, 10:29 AM
Tigger
 
Posts: n/a
Default Re: MySQL search with result scoring / relevance

On Mar 28, 8:22 am, "mare...@gmail.com" <mare...@gmail.com> wrote:
> I've solved the 'or' and 'and' problem but now I'm facing a new one (a
> shameful syntax error).
> This is my query:
>
> select *
> , case when `col1` LIKE '%val1%' then 1 else 0 end Token1
> , case when `col2` LIKE '%val2%' then 2 else 0 end Token2
> , case when Token1 * Token2 > 0 then Token1 * Token2 else 0 end score
> from `table` HAVING score > 0 order by score DESC
>
> I get the following error:
> #1054 - Unknown column 'Token0' in 'field list'
>
> How can i make this work keeping Token1 and Token2 and using them to
> compute the score and order results by score


My last post attempt seemed to fail...lets try again...

You cant reference aliases from within the field list. You have to re-
state the formulas again.

Heres my possible solution to it all...

SELECT
col1,
col2,
col3,
col4,
((`col1` LIKE '%val1%') * 4) +
((`col2` LIKE '%val2%') * 3) +
((`col3` LIKE '%val3%') * 2) +
((`col4` LIKE '%val4%') * 1) score,
(`col1` LIKE '%val1%') OR
((`col2` LIKE '%val2%') AND
(`col3` LIKE '%val3%') ) OR
(`col4` LIKE '%val4%') booleanMatch
FROM test
HAVING booleanMatch = 1 AND score > 0 order by score DESC

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 02-28-2008, 10:30 AM
mareeus@gmail.com
 
Posts: n/a
Default Re: MySQL search with result scoring / relevance

Thanks Tigger,

Your solution is almost perfect but imagine a complex search string
with 10 or 20 search values. Computing both score and booleanmatch
won't be very optimized. They both mean almost the same thing. That's
why i need some temporary values like Token1 and Token2 to store the
values once computed and reduce the query complexity a little. So if
anyone could help me to rewrite my query keeping Token1 and Token2,
would be great.

select *
, case when `col1` LIKE '%val1%' then 1 else 0 end Token1
, case when `col2` LIKE '%val2%' then 2 else 0 end Token2
, case when Token1 * Token2 > 0 then Token1 * Token2 else 0 end score
from `table` HAVING score > 0 order by score DESC

#1054 - Unknown column 'Token0' in 'field list'

Regards,
Marius.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 02-28-2008, 10:30 AM
Tigger
 
Posts: n/a
Default Re: MySQL search with result scoring / relevance

On Mar 28, 11:44 pm, "mare...@gmail.com" <mare...@gmail.com> wrote:
> Thanks Tigger,
>
> Your solution is almost perfect but imagine a complex search string
> with 10 or 20 search values. Computing both score and booleanmatch
> won't be very optimized. They both mean almost the same thing. That's
> why i need some temporary values like Token1 and Token2 to store the
> values once computed and reduce the query complexity a little. So if
> anyone could help me to rewrite my query keeping Token1 and Token2,
> would be great.
>
> select *
> , case when `col1` LIKE '%val1%' then 1 else 0 end Token1
> , case when `col2` LIKE '%val2%' then 2 else 0 end Token2
> , case when Token1 * Token2 > 0 then Token1 * Token2 else 0 end score
> from `table` HAVING score > 0 order by score DESC
>
> #1054 - Unknown column 'Token0' in 'field list'
>
> Regards,
> Marius.



You could play around with the order of things. Like

SELECT
col1,
col2,
col3,
col4
FROM test
WHERE
(`col1` LIKE '%val1%') OR
((`col2` LIKE '%val2%') AND
(`col3` LIKE '%val3%') ) OR
(`col4` LIKE '%val4%')
ORDER BY (((`col1` LIKE '%val1%') * 4) +
((`col2` LIKE '%val2%') * 3) +
((`col3` LIKE '%val3%') * 2) +
((`col4` LIKE '%val4%') * 1)) DESC

This would limit the 2nd calculations to only happen for rows that
match the boolean expression.

Another thing to look at is doing Full-Text Searches using the MATCH
AGAINST syntax...
http://dev.mysql.com/doc/refman/5.0/...xt-search.html

This can provide cleverer results for your searches, and I remember
reading once that repeating the same MATCH statement in the field and
in a where clause will only cause the one calculation, so you don't
have to worry about the same efficiency issue.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 04:15 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com