This is a discussion on Fulltext searches, synonyms, and alternate spellings... within the MySQL forums, part of the Database Server Software category; --> I was wondering if there was any way to somehow modify a fulltext search to be able to match ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I was wondering if there was any way to somehow modify a fulltext search to be able to match on full synonyms and (perhaps more importantly) alternative spellings. For instance, INSERT INTO some_table (words_text) VALUES ('He parked the automobile with its tyres against the kerb.') SELECT * FROM some_table WHERE MATCH (words_text) AGAINST ('car tires curb') ....won't match at all, when any English-speaking human can see it should. Hoping for any recommendations on how to deal with this, -- Dodger |
| |||
| On 12 Nov 2006 15:47:46 -0800, Dodger wrote: > I was wondering if there was any way to somehow modify a fulltext > search to be able to match on full synonyms and (perhaps more > importantly) alternative spellings. > > For instance, > > INSERT > INTO some_table (words_text) > VALUES ('He parked the automobile with its tyres against the kerb.') > > SELECT * > FROM some_table > WHERE MATCH (words_text) > AGAINST ('car tires curb') > > ...won't match at all, when any English-speaking human can see it > should. > > Hoping for any recommendations on how to deal with this, Wave a magic wand? I speak English (.us subvarint) natively and would never expect a search engine to match "tyre" to "tire". As for "kerb" to "curb", and "car" to "automobile", well, they're different words with different meaning. For example, "curb" is both a noun and a verb. "Kerb" is not. "Car" can apply to a railroad wagon, a conveyance hung from wires, or part of an elevator, above and beyond an automobile. It's not the job of a search function to account for muddling thinking. -- 59. I will never build a sentient computer smarter than I am. --Peter Anspach's list of things to do as an Evil Overlord |
| |||
| Peter H. Coffin wrote: > Wave a magic wand? I speak English (.us subvarint) natively and would > never expect a search engine to match "tyre" to "tire". As for "kerb" to > "curb", and "car" to "automobile", well, they're different words with > different meaning. For example, "curb" is both a noun and a verb. "Kerb" > is not. "Car" can apply to a railroad wagon, a conveyance hung from > wires, or part of an elevator, above and beyond an automobile. It's not > the job of a search function to account for muddling thinking. Hi, Peter. Thanks for replying. 'Kerb' is the British spelling for 'curb' (as in, edge of sidewalk). 'Tyre' is the British spelling for 'tire' (as in the rubber part of a car's wheel). Both 'curb' and 'tire' have other meanings as well, but they are less likely in some circumstances -- for instance, a catalog of toys. 'Car' might have multiple meanings (and because of model railroads, have high likelihood even in toys catalogs) but, for instance, 'motorcar' doesn't and would be an exact synonym for 'automobile', and even with 'car' a search for 'train car' would bring up train cars at a higher relevance than motorcars, as it should be (and allow for '-motorcar' in a boolean search). Thing is, I wouldn't *expect* a search engine to match on them either, but I *want* one to somehow. Actually, I did come up with a sort-of solution since I posted that will work in a preprocessing sense. All I have to do is make a table of synonyms like so: [synonym] word varchar(32) synonym varchar(32) ....then I can add a column 'alternate_word' and add it to the fulltext index. Then for each word in synonym, I can update the synonym column to include that synonym where the other columns match the word but don't match the synonym already. That way when someone searches a costume store for 'Armor' they get things made by Brits, Aussies and Canadians, too. -- Dodger |
| ||||
| Dodger wrote: > I was wondering if there was any way to somehow modify a fulltext > search to be able to match on full synonyms and (perhaps more > importantly) alternative spellings. This is a tough one. As you said, you could build a dictionary of synonyms and use that to expand your query, but you will have a hard time getting good coverage. What is most effective in natural language is blind relevance feedback, i.e. automatic query expansion using terms from the documents that use your initial terms. You will thus find documents that share common words with documents talking about cars, tires, and curbs, which will hopefully include documents talking about automobiles, tyres, and kerbs. However, this is obviously far from perfect and only works when you have a large document collection, including documents that match your initial query. You can use this in MySQL by using the "WITH QUERY EXPANSION" option. You will find more documentation by googling for this option. HTH, HAND, Jens |