Unix Technical Forum

Best way to use indexes for partial match at beginning

This is a discussion on Best way to use indexes for partial match at beginning within the Pgsql General forums, part of the PostgreSQL category; --> I have 10 CHARACTER columns in table mytable. Mytable is indexed by some of those columns. CREATE TABLE mytable ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql General

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-09-2008, 06:51 AM
Andrus Moor
 
Posts: n/a
Default Best way to use indexes for partial match at beginning

I have 10 CHARACTER columns in table mytable. Mytable is indexed by some of
those columns.

CREATE TABLE mytable ( col1 CHARACTER(10),
col2 CHARACTER(10),col3 CHARACTER(10),col4 CHARACTER(10),col5
CHARACTER(10),col6 CHARACTER(10),col7 CHARACTER(10),col8 CHARACTER(10),col9
CHARACTER(10), col10 CHARACTER(10) );

CREATE INDEX i1 ON mytable(col1);
CREATE INDEX i2 ON mytable(col2);

I need to select records by knowing some characters from beginning.
I know always 1-10 first characters of col1. So my LIKE pattern starts
always with constant characters and ends with % .

I can use LIKE:

SELECT * FROM mytable
WHERE col1 LIKE 'A%'
AND col2 LIKE 'BC%'
AND col3 LIKE 'DEF%'
AND col4 LIKE 'G%';

or substring():

SELECT * FROM mytable
WHERE substring(col1 for 1)='A'
AND substring(col2 for 2)= 'BC'
AND substring(col3 for 3)='DEF'
AND substring(col4 for 1) ='G';


Can Postgres 8.1 use indexes to speed the queries above ?

Which is the best way to to write the where clause in this case so that
index is used ?

Andrus.


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-09-2008, 06:53 AM
Jim C. Nasby
 
Posts: n/a
Default Re: Best way to use indexes for partial match at beginning

Well, for starters, see if PostgreSQL is currently using any indexes via
EXPLAIN. First rule of performance tuning: don't.

If it's not (which is probably the case), then your best bet is to
create functional indexes; ie:

CREATE INDEX mytable__col1_4 ON mytable( substring( col1 for 4 ) );

You can then either

SELECT ... WHERE substring( col1 for 4 ) = blah

or

SELECT ... WHERE substring( col1 for 4 ) LIKE 'bla%'

Though that last one might not use the index; you'll have to check and
see.

Also, keep in mind that PostgreSQL doesn't store CHAR the same as most
other databases; the internal storage is the same as what's used for
VARCHAR and TEXT.

On Sun, Nov 06, 2005 at 11:03:01PM +0200, Andrus Moor wrote:
> I have 10 CHARACTER columns in table mytable. Mytable is indexed by some of
> those columns.
>
> CREATE TABLE mytable ( col1 CHARACTER(10),
> col2 CHARACTER(10),col3 CHARACTER(10),col4 CHARACTER(10),col5
> CHARACTER(10),col6 CHARACTER(10),col7 CHARACTER(10),col8 CHARACTER(10),col9
> CHARACTER(10), col10 CHARACTER(10) );
>
> CREATE INDEX i1 ON mytable(col1);
> CREATE INDEX i2 ON mytable(col2);
>
> I need to select records by knowing some characters from beginning.
> I know always 1-10 first characters of col1. So my LIKE pattern starts
> always with constant characters and ends with % .
>
> I can use LIKE:
>
> SELECT * FROM mytable
> WHERE col1 LIKE 'A%'
> AND col2 LIKE 'BC%'
> AND col3 LIKE 'DEF%'
> AND col4 LIKE 'G%';
>
> or substring():
>
> SELECT * FROM mytable
> WHERE substring(col1 for 1)='A'
> AND substring(col2 for 2)= 'BC'
> AND substring(col3 for 3)='DEF'
> AND substring(col4 for 1) ='G';
>
>
> Can Postgres 8.1 use indexes to speed the queries above ?
>
> Which is the best way to to write the where clause in this case so that
> index is used ?
>
> Andrus.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>


--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-09-2008, 06:54 AM
Andrus
 
Posts: n/a
Default Re: Best way to use indexes for partial match at beginning

> Well, for starters, see if PostgreSQL is currently using any indexes via
> EXPLAIN. First rule of performance tuning: don't.


I'm designing a new application. Data is not available yet.
I'm using Postgres 8.1 in Windows. Database encoding is UTF-8
lc_ctype is Estonian_Estonia.1257.
lc_collate is Estonian currently. However I can set lc_collate to C if this
solves this issue.

Doc says that
" to allow PostgreSQL to use indexes with LIKE clauses under a non-C locale,
several custom operator classes exist"

I don't understand "non-C locale". Does this mean lc_collate or also some
other lc_ setting ?

> If it's not (which is probably the case), then your best bet is to
> create functional indexes; ie:
>
> CREATE INDEX mytable__col1_4 ON mytable( substring( col1 for 4 ) );
>
> You can then either
>
> SELECT ... WHERE substring( col1 for 4 ) = blah


I need to optimize queries with variable number of characters in beginning
like

SELECT ... WHERE substring( col1 for 1 ) = 'f'
SELECT ... WHERE substring( col1 for 2 ) = 'fo'
SELECT ... WHERE substring( col1 for 3 ) = 'foo'
etc

This approach requires creating 10 indexes for each column which is
unreasonable.

In my current dbms, Microsoft Visual FoxPro I have a single index

CREATE INDEX i1 ON mytable(col1)

I can use queries:

WHERE col1 BETWEEN 'f' and 'f'+CHR(255)
WHERE col1 BETWEEN 'fo' and 'fo'+CHR(255)
WHERE col1 BETWEEN 'foo' and 'foo'+CHR(255)

All those queries can use same index automatically in all locales. CHR(255)
is last character in any lc_collate sequence. CHR(255) is not used in col1
data.

How to get same functionality in Postgres ?
Does there exist unicode special character which is greater than all other
chars ?

Andrus.


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-09-2008, 06:54 AM
Martijn van Oosterhout
 
Posts: n/a
Default Re: Best way to use indexes for partial match at beginning

On Wed, Nov 09, 2005 at 12:37:25PM +0200, Andrus wrote:
> I'm using Postgres 8.1 in Windows. Database encoding is UTF-8
> lc_ctype is Estonian_Estonia.1257.
> lc_collate is Estonian currently. However I can set lc_collate to C if this
> solves this issue.
>
> Doc says that
> " to allow PostgreSQL to use indexes with LIKE clauses under a non-C locale,
> several custom operator classes exist"
>
> I don't understand "non-C locale". Does this mean lc_collate or also some
> other lc_ setting ?


lc == locale. There are several different locale settings but collation
affects ordering. And Estonian is not C (obviously).

> I need to optimize queries with variable number of characters in beginning
> like
>
> SELECT ... WHERE substring( col1 for 1 ) = 'f'
> SELECT ... WHERE substring( col1 for 2 ) = 'fo'
> SELECT ... WHERE substring( col1 for 3 ) = 'foo'
> etc


If you use queries like:

SELECT ... WHERE col1 LIKE 'fo%'

it can use an index declared like:

CREATE INDEX myindex on mytable(col1 text_pattern_ops);

> In my current dbms, Microsoft Visual FoxPro I have a single index
>
> CREATE INDEX i1 ON mytable(col1)
>
> I can use queries:
>
> WHERE col1 BETWEEN 'f' and 'f'+CHR(255)
> WHERE col1 BETWEEN 'fo' and 'fo'+CHR(255)
> WHERE col1 BETWEEN 'foo' and 'foo'+CHR(255)


Well, you could do that in postgresql too, you just need to use the SQL
standard concatination operator.

WHERE col1 BETWEEN 'f' and 'f' || chr(255);

> How to get same functionality in Postgres ?
> Does there exist unicode special character which is greater than all other
> chars ?


Umm, I don't think so. Order is defined by the locale, not the
character set. My guess is that text_pattern_ops is the way to go.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQFDcdb8IB7bNG8LQkwRArUfAJ9Lbn21OrBbioOJjNULA2 ATEysuEQCfXOSH
CpkbbZqoTFDUnlIQRpJS0bA=
=sWPr
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-09-2008, 06:55 AM
Andrus
 
Posts: n/a
Default Re: Best way to use indexes for partial match at beginning

Martijn,

>> I can use queries:
>>
>> WHERE col1 BETWEEN 'f' and 'f'+CHR(255)


>Well, you could do that in postgresql too, you just need to use the SQL
>standard concatination operator.


>WHERE col1 BETWEEN 'f' and 'f' || chr(255);


thank you.

I think it is best to use regular indexes since regular indexes since they
can be used in other types of queries also.

It seems that only way is to use BETWEEN comparison for this in Postgres
8.1.

I tried

CREATE TABLE foo ( col1 CHAR(20));
CREATE INDEX i1 ON foo(col1);
INSERT INTO foo VALUES ('bar');
SELECT * FROM foo WHERE col1 BETWEEN 'b' and 'b' || chr(255);

But this does not return any data.

How to write index optimizable WHERE clause when only some (variable number)
of characters from beginning of col1 are known ?

Only way seems to use BETWEEN comparison by concatenating character greater
than all other characters in locale. Since CHR(255) does not work this is
not possible.

So

CREATE INDEX i1 ON foo(col1);

cannot be used to optimize queries of type "get all rows where first n
charaters of col1 are known" in Postgres.

Andrus.


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-09-2008, 06:55 AM
Richard Huxton
 
Posts: n/a
Default Re: Best way to use indexes for partial match at beginning

Andrus wrote:
> So
>
> CREATE INDEX i1 ON foo(col1);
>
> cannot be used to optimize queries of type "get all rows where first n
> charaters of col1 are known" in Postgres.


Of course it will! Any btree based index will let you do that. Re-read
the previous answers and make sure you pay attention to the bit about
text_pattern_ops and LIKE in non-C locales.

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-09-2008, 06:55 AM
Jaime Casanova
 
Posts: n/a
Default Re: Best way to use indexes for partial match at beginning

On 11/9/05, Andrus <eetasoft@online.ee> wrote:
> Martijn,
>
> >> I can use queries:
> >>
> >> WHERE col1 BETWEEN 'f' and 'f'+CHR(255)

>
> >Well, you could do that in postgresql too, you just need to use the SQL
> >standard concatination operator.

>
> >WHERE col1 BETWEEN 'f' and 'f' || chr(255);

>
> thank you.
>
> I think it is best to use regular indexes since regular indexes since they
> can be used in other types of queries also.
>
> It seems that only way is to use BETWEEN comparison for this in Postgres
> 8.1.
>
> I tried
>
> CREATE TABLE foo ( col1 CHAR(20));
> CREATE INDEX i1 ON foo(col1);
> INSERT INTO foo VALUES ('bar');
> SELECT * FROM foo WHERE col1 BETWEEN 'b' and 'b' || chr(255);
>
> But this does not return any data.
>
> How to write index optimizable WHERE clause when only some (variable number)
> of characters from beginning of col1 are known ?
>
> Only way seems to use BETWEEN comparison by concatenating character greater
> than all other characters in locale. Since CHR(255) does not work this is
> not possible.
>
> So
>
> CREATE INDEX i1 ON foo(col1);
>
> cannot be used to optimize queries of type "get all rows where first n
> charaters of col1 are known" in Postgres.
>
> Andrus.
>
>


you can create two indexes:

CREATE INDEX myindex_lc on mytable(col1 text_pattern_ops);
and
CREATE INDEX myindex_normal ON foo(col1);

the first one will be used when using LIKE and the other for normal
comparisons .


--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-09-2008, 06:55 AM
Andrus
 
Posts: n/a
Default Re: Best way to use indexes for partial match at beginning

>> CREATE INDEX i1 ON foo(col1);
>>
>> cannot be used to optimize queries of type "get all rows where first n
>> charaters of col1 are known" in Postgres.

>
> Of course it will! Any btree based index will let you do that. Re-read the
> previous answers and make sure you pay attention to the bit about
> text_pattern_ops and LIKE in non-C locales.


Richard,

thank you. I try to formulate my problem more presicely.
I have table

CREATE TABLE foo ( bar CHAR(10) PRIMARY KEY);

Cluster locale is non-C. Database encoding is UTF-8. Postgres vers is 8.1

PRIMARY KEY clause creates btree based index so the index exists on bar.

I want to run fast queries by knowing first characters of bar like :

1. Select records from foo where first character of bar is A
2. Select records from foo where first character of bar is B
3. Select records from foo where first two characters of bar are BC
4. Select records from foo where first three characters of bar are ABC
etc.

Can you write sample WHERE clause which can use existing primary key index
for those queries ?

Andrus.


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-09-2008, 06:55 AM
Andrus
 
Posts: n/a
Default Re: Best way to use indexes for partial match at beginning

> you can create two indexes:
>
> CREATE INDEX myindex_lc on mytable(col1 text_pattern_ops);
> and
> CREATE INDEX myindex_normal ON foo(col1);
>
> the first one will be used when using LIKE and the other for normal
> comparisons .


Jaime,

CREATE INDEX myindex_normal ON foo(col1);

Creates btree structure. In other dbm system btree structure can be used for
searches where only some first characters in index key are known.

So I see no reason to create second index using text_pattern_ops for this
purpose.

I'm searching a way to use Postgres regular index for this.

Andrus.


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-09-2008, 06:55 AM
Scott Marlowe
 
Posts: n/a
Default Re: Best way to use indexes for partial match at

On Wed, 2005-11-09 at 14:56, Andrus wrote:
> > you can create two indexes:
> >
> > CREATE INDEX myindex_lc on mytable(col1 text_pattern_ops);
> > and
> > CREATE INDEX myindex_normal ON foo(col1);
> >
> > the first one will be used when using LIKE and the other for normal
> > comparisons .

>
> Jaime,
>
> CREATE INDEX myindex_normal ON foo(col1);
>
> Creates btree structure. In other dbm system btree structure can be used for
> searches where only some first characters in index key are known.
>
> So I see no reason to create second index using text_pattern_ops for this
> purpose.
>
> I'm searching a way to use Postgres regular index for this.


Easy, do what those other databases do. Setup your database to not use
a locale.

initdb --locale=C

and you're golden.

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:30 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com