Unix Technical Forum

bug: LC_CTYPE=en_US.UTF-8 confuses query planner

This is a discussion on bug: LC_CTYPE=en_US.UTF-8 confuses query planner within the pgsql Hackers forums, part of the PostgreSQL category; --> I encountered a bug where the same query behaves differently under different LC_CTYPE settings, "C" and "en_US.UTF-8". The query ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-11-2008, 04:37 AM
Dmitry Karasik
 
Posts: n/a
Default bug: LC_CTYPE=en_US.UTF-8 confuses query planner

I encountered a bug where the same query behaves differently
under different LC_CTYPE settings, "C" and "en_US.UTF-8".

The query is of type SELECT ... WHERE a like 'x' and b like 'y', where relevant
indexes exist for a and b, and 'x' and 'y' strings do not contain the %
character. When database is initdb'ed with LC_CTYPE=C, the query uses index
scan; when LC_CTYPE=en_US.UTF-8 it is the sequential scan. The table is large,
so it doesn't seem that planner selects seqscan out of performance reasons.
Also, I think this is a bug since when the query contains only one 'like'
statement, the query planner does use the index, no matter what $LC_CTYPE
value is.

Details:
pgsql 8.0.3

LC_CTYPE=C:
# explain select * from queues where username like 'a' and hostname like 'b';
QUERY PLAN
------------------------------------------------------------------------------------------
Index Scan using queues_idx_hostname_time on queues (cost=0.00..11.48 rows=1 width=161)
Index Cond: (hostname = 'b'::text)
Filter: ((username ~~ 'a'::text) AND (hostname ~~ 'b'::text))
(3 rows)


LC_CTYPE=en_US.UTF-8:
# explain select * from queues where username like 'a' and hostname like 'b';
QUERY PLAN
------------------------------------------------------------------------
Seq Scan on queues (cost=100000000.00..100000016.15 rows=1 width=161)
Filter: ((username ~~ 'a'::text) AND (hostname ~~ 'b'::text))
(2 rows)

# \d queues
...
username | text | not null
hostname | text | not null

--
Sincerely,
Dmitry Karasik

---
catpipe Systems ApS
*BSD solutions, consulting, development
www.catpipe.net
+45 7021 0050

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-11-2008, 04:37 AM
Tom Lane
 
Posts: n/a
Default Re: bug: LC_CTYPE=en_US.UTF-8 confuses query planner

Dmitry Karasik <dk@catpipe.net> writes:
> When database is initdb'ed with LC_CTYPE=C, the query uses index
> scan; when LC_CTYPE=en_US.UTF-8 it is the sequential scan.


This is in the FAQ:

When using wild-card operators such as LIKE or ~, indexes can only be
used in certain circumstances:
...
* The default C locale must be used during initdb because it is not
possible to know the next-greatest character in a non-C locale.
You can create a special text_pattern_ops index for such cases
that work only for LIKE indexing.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:30 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com