vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi, I want to use MySQL's full-text retrieval, but would need to optimize it for my application. It seems possible to switch between two term weighting schemes (IDF and IDFP), is there any way have greater control over this? More importantly, I will need to adjust the document length normalization, which is completely inadequate for my purpose. Is this possible? I will also need to adjust the stop-word list, reduce the minimum length of indexed words, and remove the 50% frequency cutoff, all of which seem to be nicely documented. No problems there, at least. Thanks, Jens |
| |||
| Hi, You could adjust the document length normalization factor by modifying PIVOT_VAL in myisam/ftdefs.h and re-compiling. The ft_stopword_file system variable controls the list of stopwords. ft_min_word_len controls the minimum word length. There's some more info at: <http://dev.mysql.com/doc/refman/5.0/en/fulltext-fine-tuning.html> Boolean mode FTS doesn't use the 50% threshold. Examples here: <http://dev.mysql.com/doc/refman/5.0/en/fulltext-boolean.html> If you need more control, you might try using the lucene search engine. Jens Grivolla wrote: > Hi, > > I want to use MySQL's full-text retrieval, but would need to optimize > it for my application. > > It seems possible to switch between two term weighting schemes (IDF and > IDFP), is there any way have greater control over this? > > More importantly, I will need to adjust the document length > normalization, which is completely inadequate for my purpose. Is this > possible? > > I will also need to adjust the stop-word list, reduce the minimum > length of indexed words, and remove the 50% frequency cutoff, all of > which seem to be nicely documented. No problems there, at least. > > Thanks, > Jens |
| |||
| Hi, petersprc@gmail.com wrote: > [lots of useful stuff about ft index options] Ok, I am now inclined to go for boolean matching (which apparently counts the number of different matches) and normalize by length afterwards. Unfortunately, it doesn't seem to count the number of times a term was matched in the document. Anyway, how can I get the number of words in a string? I couldn't find anything like that in the string functions section of the manual. > If you need more control, you might try using the lucene search engine. It's not really a natural language document retrieval application, I'm just abusing it for my purpose. And I'd really like to be able to work directly on our databases. Thanks for your help. Ciao, Jens |
| |||
| There are some comments code here which might be useful in counting substrings, or splitting a string into words: http://dev.mysql.com/doc/refman/5.0/...functions.html Jens Grivolla wrote: > Hi, > > petersprc@gmail.com wrote: > > [lots of useful stuff about ft index options] > > Ok, I am now inclined to go for boolean matching (which apparently > counts the number of different matches) and normalize by length > afterwards. Unfortunately, it doesn't seem to count the number of > times a term was matched in the document. > > Anyway, how can I get the number of words in a string? I couldn't find > anything like that in the string functions section of the manual. > > > If you need more control, you might try using the lucene search engine. > > It's not really a natural language document retrieval application, I'm > just abusing it for my purpose. And I'd really like to be able to work > directly on our databases. > > Thanks for your help. > > Ciao, > Jens |
| ||||
| petersprc wrote: > There are some comments code here which might be useful in counting > substrings, or splitting a string into words: > > http://dev.mysql.com/doc/refman/5.0/...functions.html Thanks, that really looks like some of those examples do what I need. I did read the documentation (probably even on that same page), but completely missed the comments. |