View Single Post

   
  #10 (permalink)  
Old 04-12-2008, 02:30 AM
Hannu Krosing
 
Posts: n/a
Default Re: text_position worst case runtime

Ühel kenal päeval, R, 2006-05-19 kell 11:20, kirjutas Jim C. Nasby:
> On Thu, May 18, 2006 at 06:49:38PM -0700, Mark Dilger wrote:
> > > I would think that the worst-case times would be fairly improbable.
> > > I'm disinclined to push something as complicated as Boyer-Moore matching
> > > into this function without considerable evidence that it's a performance
> > > bottleneck for real applications.

> >
> > A common approach in biological data applications is to store nucleic and amino
> > acid sequences as text in a relational database. The smaller alphabet sizes and
> > the tendency for redundancy in these sequences increases the likelihood of a
> > performance problem. I have solved this problem by writing my own data types
> > with their own functions for sequence comparison and alignment, and I used
> > boyer-moore for some of that work. Whether the same technique should be used
> > for the text and varchar types was unclear to me, hence the question.

>
> Perhaps it would be best to add a seperate set of functions that use
> boyer-moore, and reference them in appropriate places in the
> documentation. Unless someone has a better idea on how we can find out
> what people are actually doing in the field...


I guess our regex implementation already uses boyer-moore or similar.
Why not just expose the match position of substring('text' in 'regex')
using some function, called match_position(int searched_text, int
regex, int matchnum) ?

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com




---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Reply With Quote