This is a discussion on Words suggest or spellcheck in MS SQL 2005 Express within the SQL Server forums, part of the Microsoft SQL Server category; --> I am trying to recreate the same functionality Google has in regards to suggesting words (not names), when you ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I am trying to recreate the same functionality Google has in regards to suggesting words (not names), when you misspell something it comes up with suggestions. We have a list of words in the database to match against. I've looked at SOUNDEX but it is not close enough, DIFFERENCE is even worse. The only way I can get SOUNDEX to be more accurate is with SELECT [word] FROM [tbl_word] WHERE ( SOUNDEX( word ) = SOUNDEX( 'test' ) AND LEN( word) = LEN( 'test' ) ) I've been looking at Regular Expression matching which I reckon would provide more accurate matches. Not sure how that will affect performance, as we could be talking about 20,000 records. Or also been looking at the Double Metaphone algorithm. Is there something else that I am missing, anyone know what to use in a situation like this? Thanks in advance. |
| ||||
| Hi, You can try Edit Distance or Levenshtein distance, which are more advanced similarity algorithms http://www.google.com.ar/search?hl=e...sque da&meta= http://en.wikipedia.org/wiki/Levenshtein_distance I believe there are some T-SQL implementations to these methods. However I guess Google must do more than look for most similar words to your keywords. Maybe you could find most similar and used words to try to detect what user is most probably going to look for. Hope that helps, Damian On 25 ene, 20:38, "Pacific Fox" <tacofl...@gmail.com> wrote: > I am trying to recreate the same functionality Google has in regards to > suggesting words (not names), when you misspell something it comes up > with suggestions. > > We have a list of words in the database to match against. > > I've looked at SOUNDEX but it is not close enough, DIFFERENCE is even > worse. > The only way I can get SOUNDEX to be more accurate is with > SELECT [word] > FROM [tbl_word] > WHERE ( SOUNDEX( word ) = SOUNDEX( 'test' ) AND LEN( word) = LEN( > 'test' ) ) > > I've been looking at Regular Expression matching which I reckon would > provide more accurate matches. Not sure how that will affect > performance, as we could be talking about 20,000 records. > > Or also been looking at the Double Metaphone algorithm. > > Is there something else that I am missing, anyone know what to use in a > situation like this? > > Thanks in advance. |