Unix Technical Forum

SEO

vBulletin Search Engine Optimization


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 08:48 AM
Alexandru Coseru
 
Posts: n/a
Default Regex performance issue

Hello..

I have a low performance problem with regexp.

Here are the details:

asterisk=> explain analyze SELECT * FROM destlist WHERE '0039051248787' ~
prefix AND id_ent='-2' AND dir=0 ORDER by length(prefix) DESC;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=7925.07..7925.15 rows=31 width=67) (actual
time=857.715..857.716 rows=2 loops=1)
Sort Key: length((prefix)::text)
-> Bitmap Heap Scan on destlist (cost=60.16..7924.30 rows=31 width=67)
(actual time=2.156..857.686 rows=2 loops=1)
Recheck Cond: ((id_ent = -2) AND (dir = 0))
Filter: ('0039051248787'::text ~ (prefix)::text)
-> Bitmap Index Scan on destlist_indx2 (cost=0.00..60.16
rows=6193 width=0) (actual time=1.961..1.961 rows=5205 loops=1)
Index Cond: ((id_ent = -2) AND (dir = 0))
Total runtime: 857.804 ms
(8 rows)


The main problem is the query time.
As you can see , the use of index destlist_indx2 is pretty quick (1.9 ms)
, but the regexp operation takes a lot of time (857 ms).

How can i improve that ?

Regards
Alex


PS: Additional info:

[root@voce1 billing]# uname -a
Linux voce1 2.6.11-1.1369_FC4smp #1 SMP Thu Jun 2 23:16:33 EDT 2005 x86_64
x86_64 x86_64 GNU/Linux

[root@voce1 billing]# free
total used free shared buffers cached
Mem: 1023912 977788 46124 0 80900 523868
-/+ buffers/cache: 373020 650892
Swap: 3172728 8488 3164240


Welcome to psql 8.1.2, the PostgreSQL interactive terminal.

asterisk=> \d destlist;
Table "public.destlist"
Column | Type | Modifiers
---------+------------------------+-----------------------------------------------------------
id | bigint | not null default
nextval(('destlist_id'::text)::regclass)
id_ent | integer |
dir | integer |
prefix | character varying(255) |
country | character varying(255) |
network | character varying(255) |
tip | integer |


Indexes:
"destlist_unique" UNIQUE, btree (id_ent, dir, prefix)
"destlist_indx2" btree (id_ent, dir)
"destlist_indx3" btree (id_ent, dir, prefix)
"mmumu" btree (prefix varchar_pattern_ops)


asterisk=> select count(*) from destlist;
count
--------
576424
(1 row)






[root@voce1 billing]# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Xeon(TM) CPU 3.00GHz
stepping : 3
cpu MHz : 2992.658
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm
constant_tsc pni monitor ds_cpl cid cx16 xtpr
bogomips : 5914.62
clflush size : 64
cache_alignment : 128
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Xeon(TM) CPU 3.00GHz
stepping : 3
cpu MHz : 2992.658
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm
constant_tsc pni monitor ds_cpl cid cx16 xtpr
bogomips : 5980.16
clflush size : 64
cache_alignment : 128
address sizes : 36 bits physical, 48 bits virtual
power management:



---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 08:48 AM
Dave Dutcher
 
Posts: n/a
Default Re: Regex performance issue

> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org On Behalf Of Alexandru Coseru
> asterisk=> explain analyze SELECT * FROM destlist WHERE
> '0039051248787' ~
> prefix AND id_ent='-2' AND dir=0 ORDER by length(prefix) DESC;
>
>
> QUERY PLAN
> --------------------------------------------------------------
> ----------------------------------------------------------------------
> Sort (cost=7925.07..7925.15 rows=31 width=67) (actual
> time=857.715..857.716 rows=2 loops=1)
> Sort Key: length((prefix)::text)
> -> Bitmap Heap Scan on destlist (cost=60.16..7924.30
> rows=31 width=67)
> (actual time=2.156..857.686 rows=2 loops=1)
> Recheck Cond: ((id_ent = -2) AND (dir = 0))
> Filter: ('0039051248787'::text ~ (prefix)::text)
> -> Bitmap Index Scan on destlist_indx2 (cost=0.00..60.16
> rows=6193 width=0) (actual time=1.961..1.961 rows=5205 loops=1)
> Index Cond: ((id_ent = -2) AND (dir = 0))
> Total runtime: 857.804 ms
> (8 rows)
>
>
> "mmumu" btree (prefix varchar_pattern_ops)
>


I'm surpised Postgres isn't using the index on prefix seeing as the index
uses the varchar_pattern_ops operator class. It could be that the index
isn't selective enough, or is Postgres not able to use an index with Posix
regular expressions? The docs seem to say that it can, but I'd be curious
to see what happens if you use LIKE instead of ~.

Dave



---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 08:48 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: Regex performance issue

Alexandru Coseru wrote:
> I cannot use LIKE , because the order of the match is reversed.
> The prefix column is containing telephone destinations.
> IE: ^001 - US , ^0039 Italy , etc..


Maybe you could create a functional index on substr(<minimum length of
prefix>)? It might restrict the result set prior to applying the regex
just enough to make the performance acceptable.

> asterisk=> select * from destlist LIMIT 10;
> id | id_ent | dir | prefix | country | network | tip
> ----+--------+-----+------------+-------------+--------------------+-----
> 1 | -1 | 0 | (^0093) | AFGHANISTAN | AFGHANISTAN | 6
> 2 | -1 | 0 | (^00937) | AFGHANISTAN | AFGHANISTAN Mobile | 5
> 3 | -1 | 0 | (^00355) | ALBANIA | ALBANIA | 6
> 4 | -1 | 0 | (^0035538) | ALBANIA | ALBANIA Mobile | 5
> 5 | -1 | 0 | (^0035568) | ALBANIA | ALBANIA Mobile | 5
> 6 | -1 | 0 | (^0035569) | ALBANIA | ALBANIA Mobile | 5
> 7 | -1 | 0 | (^00213) | ALGERIA | ALGERIA | 6
> 8 | -1 | 0 | (^0021361) | ALGERIA | ALGERIA Mobile | 5
> 9 | -1 | 0 | (^0021362) | ALGERIA | ALGERIA Mobile | 5
> 10 | -1 | 0 | (^0021363) | ALGERIA | ALGERIA Mobile | 5
>
> Now , I have to match a dialednumber (let's say 00213618833) and find it's destination...(It's algeria mobile).
> I tried to make with a query of using LIKE , but i was not able to get something..


Another idea would be to add some extra rows so that you could use
normal inequality searches. For example, let's take the Albanian rows:

3 | -1 | 0 | 00355
4 | -1 | 0 | 0035538
* 3 | -1 | 0 | 0035539
5 | -1 | 0 | 0035568
6 | -1 | 0 | 0035569
* 3 | -1 | 0 | 0035570

Now you can do "SELECT * FROM destlist WHERE ? >= prefix ORDER BY prefix
LIMIT 1".

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 08:48 AM
Alexandru Coseru
 
Posts: n/a
Default Re: Regex performance issue

Hello..

I cannot use the first advice , because i'm not aware of the prefix length
in the database...
This is why i'm ordering after length(prefix)..


On the 2nd one , i'm not sure that i can follow you..

Regards
Alex
----- Original Message -----
From: "Heikki Linnakangas" <heikki@enterprisedb.com>
To: "Alexandru Coseru" <alexandru.coseru@totaltelecom.ro>
Cc: "Dave Dutcher" <dave@tridecap.com>; <pgsql-performance@postgresql.org>
Sent: Sunday, December 03, 2006 12:04 AM
Subject: Re: [PERFORM] Regex performance issue


> Alexandru Coseru wrote:
>> I cannot use LIKE , because the order of the match is reversed.
>> The prefix column is containing telephone destinations.
>> IE: ^001 - US , ^0039 Italy , etc..

>
> Maybe you could create a functional index on substr(<minimum length of
> prefix>)? It might restrict the result set prior to applying the regex
> just enough to make the performance acceptable.
>
>> asterisk=> select * from destlist LIMIT 10;
>> id | id_ent | dir | prefix | country | network | tip
>> ----+--------+-----+------------+-------------+--------------------+-----
>> 1 | -1 | 0 | (^0093) | AFGHANISTAN | AFGHANISTAN | 6
>> 2 | -1 | 0 | (^00937) | AFGHANISTAN | AFGHANISTAN Mobile | 5
>> 3 | -1 | 0 | (^00355) | ALBANIA | ALBANIA | 6
>> 4 | -1 | 0 | (^0035538) | ALBANIA | ALBANIA Mobile | 5
>> 5 | -1 | 0 | (^0035568) | ALBANIA | ALBANIA Mobile | 5
>> 6 | -1 | 0 | (^0035569) | ALBANIA | ALBANIA Mobile | 5
>> 7 | -1 | 0 | (^00213) | ALGERIA | ALGERIA | 6
>> 8 | -1 | 0 | (^0021361) | ALGERIA | ALGERIA Mobile | 5
>> 9 | -1 | 0 | (^0021362) | ALGERIA | ALGERIA Mobile | 5
>> 10 | -1 | 0 | (^0021363) | ALGERIA | ALGERIA Mobile | 5
>>
>> Now , I have to match a dialednumber (let's say 00213618833) and find
>> it's destination...(It's algeria mobile).
>> I tried to make with a query of using LIKE , but i was not able to get
>> something..

>
> Another idea would be to add some extra rows so that you could use normal
> inequality searches. For example, let's take the Albanian rows:
>
> 3 | -1 | 0 | 00355
> 4 | -1 | 0 | 0035538
> * 3 | -1 | 0 | 0035539
> 5 | -1 | 0 | 0035568
> 6 | -1 | 0 | 0035569
> * 3 | -1 | 0 | 0035570
>
> Now you can do "SELECT * FROM destlist WHERE ? >= prefix ORDER BY prefix
> LIMIT 1".
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>
>
>
> --
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.409 / Virus Database: 268.15.4/563 - Release Date: 12/2/2006
>
>



---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-19-2008, 08:48 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: Regex performance issue

Alexandru Coseru wrote:
> Hello..
>
> I cannot use the first advice , because i'm not aware of the prefix
> length in the database...
> This is why i'm ordering after length(prefix)..
>
>
> On the 2nd one , i'm not sure that i can follow you..


Ok, let me try again

> asterisk=> select * from destlist LIMIT 10;
> id | id_ent | dir | prefix | country | network | tip
> ----+--------+-----+------------+-------------+--------------------+-----
> 1 | -1 | 0 | (^0093) | AFGHANISTAN | AFGHANISTAN | 6
> 2 | -1 | 0 | (^00937) | AFGHANISTAN | AFGHANISTAN Mobile | 5
> 3 | -1 | 0 | (^00355) | ALBANIA | ALBANIA | 6
> 4 | -1 | 0 | (^0035538) | ALBANIA | ALBANIA Mobile | 5
> 5 | -1 | 0 | (^0035568) | ALBANIA | ALBANIA Mobile | 5
> 6 | -1 | 0 | (^0035569) | ALBANIA | ALBANIA Mobile | 5
> 7 | -1 | 0 | (^00213) | ALGERIA | ALGERIA | 6
> 8 | -1 | 0 | (^0021361) | ALGERIA | ALGERIA Mobile | 5
> 9 | -1 | 0 | (^0021362) | ALGERIA | ALGERIA Mobile | 5
> 10 | -1 | 0 | (^0021363) | ALGERIA | ALGERIA Mobile | 5


Store the prefix in a character column, without the regex stuff. Like
below. I've removed the columns that are not relevant, in fact it would
make sense to store them in another table, and have just the id and
prefix in this table.

id | prefix | network
---+---------+--------------------
1 | 0093 | AFGHANISTAN
2 | 00937 | AFGHANISTAN Mobile
3 | 00355 | ALBANIA
4 | 0035538 | ALBANIA Mobile
5 | 0035568 | ALBANIA Mobile
6 | 0035569 | ALBANIA Mobile
7 | 00213 | ALGERIA
8 | 0021361 | ALGERIA Mobile
9 | 0021362 | ALGERIA Mobile
10 | 0021363 | ALGERIA Mobile

Now, add the rows marked with start below:

id | prefix | network
----+---------+--------------------
1 | 0093 | AFGHANISTAN
2 | 00937 | AFGHANISTAN Mobile
* 1 | 00938 | AFGHANISTAN
3 | 00355 | ALBANIA
4 | 0035538 | ALBANIA Mobile
* 3 | 0035539 | ALBANIA
5 | 0035568 | ALBANIA Mobile
6 | 0035569 | ALBANIA Mobile
* 3 | 003557 | ALBANIA
7 | 00213 | ALGERIA
8 | 0021361 | ALGERIA Mobile
9 | 0021362 | ALGERIA Mobile
10 | 0021363 | ALGERIA Mobile
* 7 | 0021364 | ALGERIA

The added rows make it unnecessary to use regex for the searches. You
can use just the >= operator like this: (using the example number you gave)

SELECT id FROM destlist WHERE '00213618833' >= prefix ORDER BY prefix
LIMIT 1

Which would return id 7. As another example, a query of "00213654321"
would match the last row, which again has id 7.

I'm too tired to figure out the exact algorithm for adding the rows, but
I'm pretty sure it can be automated... The basic idea is that when
there's a row with id A and prefix XXXX, and another row with id B and
prefix XXXXY, we add another row with id A and prefix XXXX(Y+1).

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-19-2008, 08:48 AM
Alexandru Coseru
 
Posts: n/a
Default Re: Regex performance issue

Hello..

Thanks for the tip , i think i have got the ideea..

I'm too tired too , and i will try it tommorow.


Anyway , anybody has a clue why this regex is that CPU intensive ? I did
not saw the light on my drives blinking , and also vmstat doesn't yeld any
blocks in or out...
And how can it be optimized ?

Is there a way to trace the system calls ?
strace doesn't give me anything else but some lseeks and reads...


PS: Tried it with a 8.2 snaphsot and the result is the same..

Regards
Alex
----- Original Message -----
From: "Heikki Linnakangas" <heikki@enterprisedb.com>
To: "Alexandru Coseru" <alexandru.coseru@totaltelecom.ro>
Cc: <pgsql-performance@postgresql.org>
Sent: Sunday, December 03, 2006 12:35 AM
Subject: Re: [PERFORM] Regex performance issue


> Alexandru Coseru wrote:
>> Hello..
>>
>> I cannot use the first advice , because i'm not aware of the prefix
>> length in the database...
>> This is why i'm ordering after length(prefix)..
>>
>>
>> On the 2nd one , i'm not sure that i can follow you..

>
> Ok, let me try again
>
>> asterisk=> select * from destlist LIMIT 10;
>> id | id_ent | dir | prefix | country | network | tip
>> ----+--------+-----+------------+-------------+--------------------+-----
>> 1 | -1 | 0 | (^0093) | AFGHANISTAN | AFGHANISTAN | 6
>> 2 | -1 | 0 | (^00937) | AFGHANISTAN | AFGHANISTAN Mobile | 5
>> 3 | -1 | 0 | (^00355) | ALBANIA | ALBANIA | 6
>> 4 | -1 | 0 | (^0035538) | ALBANIA | ALBANIA Mobile | 5
>> 5 | -1 | 0 | (^0035568) | ALBANIA | ALBANIA Mobile | 5
>> 6 | -1 | 0 | (^0035569) | ALBANIA | ALBANIA Mobile | 5
>> 7 | -1 | 0 | (^00213) | ALGERIA | ALGERIA | 6
>> 8 | -1 | 0 | (^0021361) | ALGERIA | ALGERIA Mobile | 5
>> 9 | -1 | 0 | (^0021362) | ALGERIA | ALGERIA Mobile | 5
>> 10 | -1 | 0 | (^0021363) | ALGERIA | ALGERIA Mobile | 5

>
> Store the prefix in a character column, without the regex stuff. Like
> below. I've removed the columns that are not relevant, in fact it would
> make sense to store them in another table, and have just the id and prefix
> in this table.
>
> id | prefix | network
> ---+---------+--------------------
> 1 | 0093 | AFGHANISTAN
> 2 | 00937 | AFGHANISTAN Mobile
> 3 | 00355 | ALBANIA
> 4 | 0035538 | ALBANIA Mobile
> 5 | 0035568 | ALBANIA Mobile
> 6 | 0035569 | ALBANIA Mobile
> 7 | 00213 | ALGERIA
> 8 | 0021361 | ALGERIA Mobile
> 9 | 0021362 | ALGERIA Mobile
> 10 | 0021363 | ALGERIA Mobile
>
> Now, add the rows marked with start below:
>
> id | prefix | network
> ----+---------+--------------------
> 1 | 0093 | AFGHANISTAN
> 2 | 00937 | AFGHANISTAN Mobile
> * 1 | 00938 | AFGHANISTAN
> 3 | 00355 | ALBANIA
> 4 | 0035538 | ALBANIA Mobile
> * 3 | 0035539 | ALBANIA
> 5 | 0035568 | ALBANIA Mobile
> 6 | 0035569 | ALBANIA Mobile
> * 3 | 003557 | ALBANIA
> 7 | 00213 | ALGERIA
> 8 | 0021361 | ALGERIA Mobile
> 9 | 0021362 | ALGERIA Mobile
> 10 | 0021363 | ALGERIA Mobile
> * 7 | 0021364 | ALGERIA
>
> The added rows make it unnecessary to use regex for the searches. You can
> use just the >= operator like this: (using the example number you gave)
>
> SELECT id FROM destlist WHERE '00213618833' >= prefix ORDER BY prefix
> LIMIT 1
>
> Which would return id 7. As another example, a query of "00213654321"
> would match the last row, which again has id 7.
>
> I'm too tired to figure out the exact algorithm for adding the rows, but
> I'm pretty sure it can be automated... The basic idea is that when there's
> a row with id A and prefix XXXX, and another row with id B and prefix
> XXXXY, we add another row with id A and prefix XXXX(Y+1).
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>
>
>
> --
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.409 / Virus Database: 268.15.4/563 - Release Date: 12/2/2006
>



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-19-2008, 08:48 AM
Scott Marlowe
 
Posts: n/a
Default Re: Regex performance issue

On Sun, 2006-12-03 at 02:53 +0200, Alexandru Coseru wrote:
> Hello..
>
> Thanks for the tip , i think i have got the ideea..
>
> I'm too tired too , and i will try it tommorow.
>
>
> Anyway , anybody has a clue why this regex is that CPU intensive ? I did
> not saw the light on my drives blinking , and also vmstat doesn't yeld any
> blocks in or out...
> And how can it be optimized ?


I think it's just that you're running the regex across 5400 or so
elements. at 850 or so milliseconds, that comes out to 150uS for each
comparison. I'd say it's just the number of comparisons that have to be
made that's killing you here.

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-19-2008, 08:48 AM
Alexandru Coseru
 
Posts: n/a
Default Re: Regex performance issue

Hello..

I have never used tsearch2 , but at a first glance , i would not see any
major improvement , because the main advantage of tsearch is the splitting
in words of a phrase..
But here , i only have one word (no spaces).


Regards
Alex
----- Original Message -----
From: "Oleg Bartunov" <oleg@sai.msu.su>
To: "Alexandru Coseru" <alexandru.coseru@totaltelecom.ro>
Cc: "Dave Dutcher" <dave@tridecap.com>; <pgsql-performance@postgresql.org>
Sent: Saturday, December 02, 2006 10:54 PM
Subject: Re: [PERFORM] Regex performance issue


>I may miss something but I'd use tsearch2. Check
> intdict dictionary for basic idea -
> http://www.sai.msu.su/~megera/wiki/Gendict
>
> Oleg
> On Sat, 2 Dec 2006, Alexandru Coseru wrote:
>
>> Hello...
>>
>> I cannot use LIKE , because the order of the match is reversed.
>> The prefix column is containing telephone destinations.
>> IE: ^001 - US , ^0039 Italy , etc..
>>
>> Here is a small sample:
>>
>> asterisk=> select * from destlist LIMIT 10;
>> id | id_ent | dir | prefix | country | network | tip
>> ----+--------+-----+------------+-------------+--------------------+-----
>> 1 | -1 | 0 | (^0093) | AFGHANISTAN | AFGHANISTAN | 6
>> 2 | -1 | 0 | (^00937) | AFGHANISTAN | AFGHANISTAN Mobile | 5
>> 3 | -1 | 0 | (^00355) | ALBANIA | ALBANIA | 6
>> 4 | -1 | 0 | (^0035538) | ALBANIA | ALBANIA Mobile | 5
>> 5 | -1 | 0 | (^0035568) | ALBANIA | ALBANIA Mobile | 5
>> 6 | -1 | 0 | (^0035569) | ALBANIA | ALBANIA Mobile | 5
>> 7 | -1 | 0 | (^00213) | ALGERIA | ALGERIA | 6
>> 8 | -1 | 0 | (^0021361) | ALGERIA | ALGERIA Mobile | 5
>> 9 | -1 | 0 | (^0021362) | ALGERIA | ALGERIA Mobile | 5
>> 10 | -1 | 0 | (^0021363) | ALGERIA | ALGERIA Mobile | 5
>>
>>
>> Now , I have to match a dialednumber (let's say 00213618833) and find
>> it's destination...(It's algeria mobile).
>> I tried to make with a query of using LIKE , but i was not able to get
>> something..
>>
>>
>> Regards
>> Alex
>>
>>
>>
>>
>>
>> ----- Original Message ----- From: "Dave Dutcher" <dave@tridecap.com>
>> To: "'Alexandru Coseru'" <alexandru.coseru@totaltelecom.ro>;
>> <pgsql-performance@postgresql.org>
>> Sent: Saturday, December 02, 2006 10:36 PM
>> Subject: RE: [PERFORM] Regex performance issue
>>
>>
>>> -----Original Message-----
>>> From: pgsql-performance-owner@postgresql.org On Behalf Of Alexandru
>>> Coseru
>>> asterisk=> explain analyze SELECT * FROM destlist WHERE
>>> '0039051248787' ~
>>> prefix AND id_ent='-2' AND dir=0 ORDER by length(prefix) DESC;
>>>
>>>
>>> QUERY PLAN
>>> --------------------------------------------------------------
>>> ----------------------------------------------------------------------
>>> Sort (cost=7925.07..7925.15 rows=31 width=67) (actual
>>> time=857.715..857.716 rows=2 loops=1)
>>> Sort Key: length((prefix)::text)
>>> -> Bitmap Heap Scan on destlist (cost=60.16..7924.30
>>> rows=31 width=67)
>>> (actual time=2.156..857.686 rows=2 loops=1)
>>> Recheck Cond: ((id_ent = -2) AND (dir = 0))
>>> Filter: ('0039051248787'::text ~ (prefix)::text)
>>> -> Bitmap Index Scan on destlist_indx2 (cost=0.00..60.16
>>> rows=6193 width=0) (actual time=1.961..1.961 rows=5205 loops=1)
>>> Index Cond: ((id_ent = -2) AND (dir = 0))
>>> Total runtime: 857.804 ms
>>> (8 rows)
>>>
>>> "mmumu" btree (prefix varchar_pattern_ops)
>>>

>>
>> I'm surpised Postgres isn't using the index on prefix seeing as the index
>> uses the varchar_pattern_ops operator class. It could be that the index
>> isn't selective enough, or is Postgres not able to use an index with
>> Posix
>> regular expressions? The docs seem to say that it can, but I'd be
>> curious
>> to see what happens if you use LIKE instead of ~.
>>
>> Dave
>>
>>
>>
>>
>>
>>

>
> Regards,
> Oleg
> __________________________________________________ ___________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
>
>
> --
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.409 / Virus Database: 268.15.4/563 - Release Date: 12/2/2006
>



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-19-2008, 08:48 AM
Tom Lane
 
Posts: n/a
Default Re: Regex performance issue

"Alexandru Coseru" <alexandru.coseru@totaltelecom.ro> writes:
> Anyway , anybody has a clue why this regex is that CPU intensive ?


The EXPLAIN result you posted offers *no* evidence that the regexp is
CPU intensive. All you know is that it took 850+ msec to fetch 5200
rows from disk and apply the regexp filter to them. There's no evidence
here that that was CPU time and not I/O time.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-19-2008, 08:48 AM
Oleg Bartunov
 
Posts: n/a
Default Re: Regex performance issue

On Sun, 3 Dec 2006, Alexandru Coseru wrote:

> Hello..
>
> I have never used tsearch2 , but at a first glance , i would not see any
> major improvement , because the main advantage of tsearch is the splitting in
> words of a phrase..
> But here , i only have one word (no spaces).


Oh, yes, I was confused What if you consider you prefix as
1.2.3.4.5.6, then you could try our contrib/ltree module.


Oleg


>
>
> Regards
> Alex
> ----- Original Message ----- From: "Oleg Bartunov" <oleg@sai.msu.su>
> To: "Alexandru Coseru" <alexandru.coseru@totaltelecom.ro>
> Cc: "Dave Dutcher" <dave@tridecap.com>; <pgsql-performance@postgresql.org>
> Sent: Saturday, December 02, 2006 10:54 PM
> Subject: Re: [PERFORM] Regex performance issue
>
>
>> I may miss something but I'd use tsearch2. Check
>> intdict dictionary for basic idea -
>> http://www.sai.msu.su/~megera/wiki/Gendict
>>
>> Oleg
>> On Sat, 2 Dec 2006, Alexandru Coseru wrote:
>>
>>> Hello...
>>>
>>> I cannot use LIKE , because the order of the match is reversed.
>>> The prefix column is containing telephone destinations.
>>> IE: ^001 - US , ^0039 Italy , etc..
>>>
>>> Here is a small sample:
>>>
>>> asterisk=> select * from destlist LIMIT 10;
>>> id | id_ent | dir | prefix | country | network | tip
>>> ----+--------+-----+------------+-------------+--------------------+-----
>>> 1 | -1 | 0 | (^0093) | AFGHANISTAN | AFGHANISTAN | 6
>>> 2 | -1 | 0 | (^00937) | AFGHANISTAN | AFGHANISTAN Mobile | 5
>>> 3 | -1 | 0 | (^00355) | ALBANIA | ALBANIA | 6
>>> 4 | -1 | 0 | (^0035538) | ALBANIA | ALBANIA Mobile | 5
>>> 5 | -1 | 0 | (^0035568) | ALBANIA | ALBANIA Mobile | 5
>>> 6 | -1 | 0 | (^0035569) | ALBANIA | ALBANIA Mobile | 5
>>> 7 | -1 | 0 | (^00213) | ALGERIA | ALGERIA | 6
>>> 8 | -1 | 0 | (^0021361) | ALGERIA | ALGERIA Mobile | 5
>>> 9 | -1 | 0 | (^0021362) | ALGERIA | ALGERIA Mobile | 5
>>> 10 | -1 | 0 | (^0021363) | ALGERIA | ALGERIA Mobile | 5
>>>
>>>
>>> Now , I have to match a dialednumber (let's say 00213618833) and find
>>> it's destination...(It's algeria mobile).
>>> I tried to make with a query of using LIKE , but i was not able to get
>>> something..
>>>
>>>
>>> Regards
>>> Alex
>>>
>>>
>>>
>>>
>>>
>>> ----- Original Message ----- From: "Dave Dutcher" <dave@tridecap.com>
>>> To: "'Alexandru Coseru'" <alexandru.coseru@totaltelecom.ro>;
>>> <pgsql-performance@postgresql.org>
>>> Sent: Saturday, December 02, 2006 10:36 PM
>>> Subject: RE: [PERFORM] Regex performance issue
>>>
>>>
>>>> -----Original Message-----
>>>> From: pgsql-performance-owner@postgresql.org On Behalf Of Alexandru
>>>> Coseru
>>>> asterisk=> explain analyze SELECT * FROM destlist WHERE
>>>> '0039051248787' ~
>>>> prefix AND id_ent='-2' AND dir=0 ORDER by length(prefix) DESC;
>>>>
>>>>
>>>> QUERY PLAN
>>>> --------------------------------------------------------------
>>>> ----------------------------------------------------------------------
>>>> Sort (cost=7925.07..7925.15 rows=31 width=67) (actual
>>>> time=857.715..857.716 rows=2 loops=1)
>>>> Sort Key: length((prefix)::text)
>>>> -> Bitmap Heap Scan on destlist (cost=60.16..7924.30
>>>> rows=31 width=67)
>>>> (actual time=2.156..857.686 rows=2 loops=1)
>>>> Recheck Cond: ((id_ent = -2) AND (dir = 0))
>>>> Filter: ('0039051248787'::text ~ (prefix)::text)
>>>> -> Bitmap Index Scan on destlist_indx2 (cost=0.00..60.16
>>>> rows=6193 width=0) (actual time=1.961..1.961 rows=5205 loops=1)
>>>> Index Cond: ((id_ent = -2) AND (dir = 0))
>>>> Total runtime: 857.804 ms
>>>> (8 rows)
>>>>
>>>> "mmumu" btree (prefix varchar_pattern_ops)
>>>>
>>>
>>> I'm surpised Postgres isn't using the index on prefix seeing as the index
>>> uses the varchar_pattern_ops operator class. It could be that the index
>>> isn't selective enough, or is Postgres not able to use an index with Posix
>>> regular expressions? The docs seem to say that it can, but I'd be curious
>>> to see what happens if you use LIKE instead of ~.
>>>
>>> Dave
>>>
>>>
>>>
>>>
>>>
>>>

>>
>> Regards,
>> Oleg
>> __________________________________________________ ___________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>>
>>
>> --
>> No virus found in this incoming message.
>> Checked by AVG Free Edition.
>> Version: 7.1.409 / Virus Database: 268.15.4/563 - Release Date: 12/2/2006
>>

>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>


Regards,
Oleg
__________________________________________________ ___________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 10:20 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
UnixAdminTalk.com

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248