Unix Technical Forum

Partial match in GIN

This is a discussion on Partial match in GIN within the Pgsql Patches forums, part of the PostgreSQL category; --> http://www.sigaev.ru/misc/partial_match_gin-0.8.gz Reworked interface as it suggested by Gregory ( http://archives.postgresql.org/pgsql...4/msg00199.php ) and move check of index into expand_indexqual_opclause() as ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Patches

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #11 (permalink)  
Old 04-19-2008, 07:28 AM
Teodor Sigaev
 
Posts: n/a
Default Re: Partial match in GIN (next vesrion)

http://www.sigaev.ru/misc/partial_match_gin-0.8.gz
Reworked interface as it suggested by Gregory
(http://archives.postgresql.org/pgsql...4/msg00199.php)
and move check of index into expand_indexqual_opclause() as suggested by Heikki
(http://archives.postgresql.org/pgsql...4/msg00200.php)

http://www.sigaev.ru/misc/tsearch_prefix-0.7.gz
Sync with current CVS and partial match GIN patch. Allow full scan index, so now
GIN supports search with queries like '!foo'. Implemented via using empty string
for prefix search.


http://www.sigaev.ru/misc/wildspeed-0.11.tgz
Sync with CVS changes and partial match GIN patch. Teach opclass to correct use
of recheck feature.



--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #12 (permalink)  
Old 04-24-2008, 07:15 PM
Teodor Sigaev
 
Posts: n/a
Default Re: Partial match in GIN (next vesrion)

http://www.sigaev.ru/misc/partial_match_gin-0.9.gz
Sync with CVS.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #13 (permalink)  
Old 05-13-2008, 07:16 PM
Tom Lane
 
Posts: n/a
Default Re: Partial match in GIN (next vesrion)

Teodor Sigaev <teodor@sigaev.ru> writes:
> http://www.sigaev.ru/misc/partial_match_gin-0.9.gz
> Sync with CVS.


Looking at this now. Wouldn't it be a good idea for comparePartial
to get the strategy number of the operator? As you have it set up,
I doubt that an opclass can support more than one partial-match
operator.

regards, tom lane

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #14 (permalink)  
Old 05-13-2008, 07:16 PM
Teodor Sigaev
 
Posts: n/a
Default Re: Partial match in GIN (next vesrion)

> Looking at this now. Wouldn't it be a good idea for comparePartial
> to get the strategy number of the operator? As you have it set up,
> I doubt that an opclass can support more than one partial-match
> operator.


It might be useful, although I don't see any usage of that right now. I'll add
this option.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #15 (permalink)  
Old 05-16-2008, 02:43 PM
Tom Lane
 
Posts: n/a
Default Re: Partial match in GIN (next vesrion)

Teodor Sigaev <teodor@sigaev.ru> writes:
>> Looking at this now. Wouldn't it be a good idea for comparePartial
>> to get the strategy number of the operator? As you have it set up,
>> I doubt that an opclass can support more than one partial-match
>> operator.


> It might be useful, although I don't see any usage of that right now. I'll add
> this option.


Ping? I'd like to get this patch out of the way.

regards, tom lane

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #16 (permalink)  
Old 05-16-2008, 02:43 PM
Teodor Sigaev
 
Posts: n/a
Default Re: Partial match in GIN (next vesrion)


>> It might be useful, although I don't see any usage of that right now. I'll add
>> this option.

> Ping? I'd like to get this patch out of the way.

I'm very sorry for long delay.
http://www.sigaev.ru/misc/partial_match_gin-0.10.gz
http://www.sigaev.ru/misc/tsearch_prefix-0.8.gz
http://www.sigaev.ru/misc/wildspeed-0.12.tgz

Changes:
- Sync with CVS HEAD
- add third option (StrategyNumber) for comparePartialFn.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW:
http://www.sigaev.ru/

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #17 (permalink)  
Old 05-16-2008, 02:43 PM
Tom Lane
 
Posts: n/a
Default Re: Partial match in GIN (next vesrion)

Teodor Sigaev <teodor@sigaev.ru> writes:
> http://www.sigaev.ru/misc/partial_match_gin-0.10.gz
> http://www.sigaev.ru/misc/tsearch_prefix-0.8.gz
> http://www.sigaev.ru/misc/wildspeed-0.12.tgz


There seems to be something broken here: it's acting like prefix search
is on all the time, eg

regression=# select 'supernova'::tsvector @@ 'super'::tsquery;
?column?
----------
t
(1 row)

regards, tom lane

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #18 (permalink)  
Old 05-16-2008, 02:43 PM
Teodor Sigaev
 
Posts: n/a
Default Re: Partial match in GIN (next vesrion)

> There seems to be something broken here: it's acting like prefix search
> is on all the time, eg


I'm in sackcloth and ashes...

Fixed and extended regression tests.

http://www.sigaev.ru/misc/tsearch_prefix-0.9.gz


--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW:
http://www.sigaev.ru/

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #19 (permalink)  
Old 05-18-2008, 11:03 PM
Tom Lane
 
Posts: n/a
Default Re: Partial match in GIN (next vesrion)

Teodor Sigaev <teodor@sigaev.ru> writes:
> http://www.sigaev.ru/misc/partial_match_gin-0.10.gz
> http://www.sigaev.ru/misc/tsearch_prefix-0.9.gz
> http://www.sigaev.ru/misc/wildspeed-0.12.tgz


I've applied the first two of these with minor editorialization (mostly
fixing documentation). However, I'm having a hard time convincing myself
that anyone will find wildspeed useful in its current form. I did a
simple experiment using a table of titles of database papers:

contrib_regression=# select count(*), avg(length(title)) from pub;
count | avg
--------+---------------------
236984 | 64.7647520507713601
(1 row)

This takes about 22MB on disk as a Postgres table. I was expecting the
wildspeed index to be about 65 times as large, which is bad enough
already, but actually it weighed in at 2165MB or nearly 100X bigger.
Plus it took forever to build: 35 minutes on a fairly fast machine
with maintenance_work_mem set to 512MB.

In comparison, building a conventional full-text-search index (GIN
tsvector) took about 22 seconds including constructing the tsvector
column, and the tsvectors plus index take about 54MB. The relative
search performance is about what you'd expect from the difference in
index sizes, ie, wildspeed loses.

So I'm thinking wildspeed really needs to be redesigned if it's to be
anything but a toy. I can't see putting it into contrib in this form.

One idea that I had was to break the given string into words (splitting
at spaces or punctuation) and store the rotations of individual words
instead of the whole string. (Actually, maybe you only need suffixes
not rotations, ie for 'abcd' store 'abcd', 'bcd', 'cd', 'd'.) Then
similarly break the LIKE pattern apart at words to create word-fragment
search keys. In this scheme the operator would always(?) require
rechecking since any part of the pattern involving punctuation wouldn't
be checkable by the index. The advantage is that the index bloat factor
is governed by the average word length not the average whole-string
length.

There are probably other approaches that would help, too.

regards, tom lane

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #20 (permalink)  
Old 05-18-2008, 11:03 PM
Oleg Bartunov
 
Posts: n/a
Default Re: Partial match in GIN (next vesrion)

Wildspeed was designed as an example application of the GIN's partial
match and as a useful extension for *short* strings. It's also good
standalone demonstration of GIN API. We tried to stay away from full text
search, parser, word delimiters and etc.
From that point of view it might be
useful contrib, but I agree we have to think better to let it more
usable.

Oleg

On Fri, 16 May 2008, Tom Lane wrote:

> Teodor Sigaev <teodor@sigaev.ru> writes:
>> http://www.sigaev.ru/misc/partial_match_gin-0.10.gz
>> http://www.sigaev.ru/misc/tsearch_prefix-0.9.gz
>> http://www.sigaev.ru/misc/wildspeed-0.12.tgz

>
> I've applied the first two of these with minor editorialization (mostly
> fixing documentation). However, I'm having a hard time convincing myself
> that anyone will find wildspeed useful in its current form. I did a
> simple experiment using a table of titles of database papers:
>
> contrib_regression=# select count(*), avg(length(title)) from pub;
> count | avg
> --------+---------------------
> 236984 | 64.7647520507713601
> (1 row)
>
> This takes about 22MB on disk as a Postgres table. I was expecting the
> wildspeed index to be about 65 times as large, which is bad enough
> already, but actually it weighed in at 2165MB or nearly 100X bigger.
> Plus it took forever to build: 35 minutes on a fairly fast machine
> with maintenance_work_mem set to 512MB.
>
> In comparison, building a conventional full-text-search index (GIN
> tsvector) took about 22 seconds including constructing the tsvector
> column, and the tsvectors plus index take about 54MB. The relative
> search performance is about what you'd expect from the difference in
> index sizes, ie, wildspeed loses.
>
> So I'm thinking wildspeed really needs to be redesigned if it's to be
> anything but a toy. I can't see putting it into contrib in this form.
>
> One idea that I had was to break the given string into words (splitting
> at spaces or punctuation) and store the rotations of individual words
> instead of the whole string. (Actually, maybe you only need suffixes
> not rotations, ie for 'abcd' store 'abcd', 'bcd', 'cd', 'd'.) Then
> similarly break the LIKE pattern apart at words to create word-fragment
> search keys. In this scheme the operator would always(?) require
> rechecking since any part of the pattern involving punctuation wouldn't
> be checkable by the index. The advantage is that the index bloat factor
> is governed by the average word length not the average whole-string
> length.
>
> There are probably other approaches that would help, too.
>
> regards, tom lane
>


Regards,
Oleg
__________________________________________________ ___________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 04:33 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com