Unix Technical Forum

Informix UPPER, LOWER doesn't work with UTF-8

This is a discussion on Informix UPPER, LOWER doesn't work with UTF-8 within the Informix forums, part of the Database Server Software category; --> We just change our database from ISO8859 to UTF-8. But when we do "select * from upper(col_name) where col_value ...


Go Back   Unix Technical Forum > Database Server Software > Informix

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-20-2008, 09:09 AM
jjjjjjjj
 
Posts: n/a
Default Informix UPPER, LOWER doesn't work with UTF-8

We just change our database from ISO8859 to UTF-8. But when we do
"select * from upper(col_name) where col_value matches '*special_chars_value*'",
we get no rows return when there is any special chars in the value.
We are using Dynamic Informix Server 2000.
Any idea?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-20-2008, 09:10 AM
Jonathan Leffler
 
Posts: n/a
Default Re: Informix UPPER, LOWER doesn't work with UTF-8

jjjjjjjj wrote:
> We just change our database from ISO8859 to UTF-8. But when we do
> "select * from upper(col_name) where col_value matches '*special_chars_value*'",
> we get no rows return when there is any special chars in the value.
> We are using Dynamic Informix Server 2000.
> Any idea?


IDS 2000 - that's been pretty much obsolete for a while now. It means
IDS 9.20 or 9.21. Which o/s platform are you using? Exactly which
version of IDS are you using? How did you change your database
character set? By rebuilding with the new DB_LOCALE, or some other
mechanism? How much non-ASCII data did you have in it? How did you
change the non-ASCII data to Unicode?

(For the benefit of those not intimately familiar with Unicode, the
Unicode character codes 0x00..0x7F match those of ISO 8859-1 and other
equivalent code sets - and those in turn are based on the original
(7-bit) ASCII code set. Hence, characters in that range need no changes
to be treated as (UTF-8 encoded) Unicode characters. However, each
character in the range 0x80..0xFF needs to be mapped into a suitable
2-byte UTF-8 character pair.)

Given that no data is returned, isn't the problem more to do with the
MATCHES operator not knowing what you mean than to do with
UPPER(col_name) failing? At least, you should explain why you think
that it is UPPER or LOWER (not illustrated) that is the trouble.

What exactly do you have in your SQL in place of 'special_chars_value'?
It is far from clear to me exactly which sequence of bytes you use -
on the assumption that you don't mean 's', 'p', 'e', ... 'e'.

--
Jonathan Leffler #include <disclaimer.h>
Email: jleffler@earthlink.net, jleffler@us.ibm.com
Guardian of DBD::Informix v2005.01 -- http://dbi.perl.org/
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-20-2008, 09:10 AM
scottishpoet
 
Posts: n/a
Default Re: Informix UPPER, LOWER doesn't work with UTF-8

"select * from upper(col_name) where col_value matches
'*special_chars_value*'",


1/ should the from clause not refer to a table rather than a column?

2/ Are you sure all the data in the table is UPPERCASE?

3/ You haven't wildcarded your matches clause

what happens with

SELECT * from <table> where UPPER(col_value) matches
UPPER("specaial_chars_value"*);

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-20-2008, 09:10 AM
jjjjjjjj
 
Posts: n/a
Default Re: Informix UPPER, LOWER doesn't work with UTF-8

We are using "Informix Dynamic Server 2000 Version 9.21.UC1" running
on
SunOS 5.9. We change our database char set from reduilding with the
new DB_LOCALE. We don't have many non-Ascii data in our database. But
we do have some data like "Café". After migration to UTF-8, our
existing search function doesn't work when the user search "Café". The
sql query for the search function is "select * from table_name where
lower(title) matches '*café*'". And i have also tried to insert some
special chars such as "€, £,....." into title column and do "select
........." again. No work. But if i remove lower function, "select *
from table_name where title matches '*Café*'", it works. But i want
to make it case insensitive. Even worse, select query will throw
exception "failed at lower .... processing string..." when i insert
"€" in the data.




Jonathan Leffler <jleffler@earthlink.net> wrote in message news:<eI%8e.8408$yq6.401@newsread3.news.pas.earthl ink.net>...
> jjjjjjjj wrote:
> > We just change our database from ISO8859 to UTF-8. But when we do
> > "select * from upper(col_name) where col_value matches '*special_chars_value*'",
> > we get no rows return when there is any special chars in the value.
> > We are using Dynamic Informix Server 2000.
> > Any idea?

>
> IDS 2000 - that's been pretty much obsolete for a while now. It means
> IDS 9.20 or 9.21. Which o/s platform are you using? Exactly which
> version of IDS are you using? How did you change your database
> character set? By rebuilding with the new DB_LOCALE, or some other
> mechanism? How much non-ASCII data did you have in it? How did you
> change the non-ASCII data to Unicode?
>
> (For the benefit of those not intimately familiar with Unicode, the
> Unicode character codes 0x00..0x7F match those of ISO 8859-1 and other
> equivalent code sets - and those in turn are based on the original
> (7-bit) ASCII code set. Hence, characters in that range need no changes
> to be treated as (UTF-8 encoded) Unicode characters. However, each
> character in the range 0x80..0xFF needs to be mapped into a suitable
> 2-byte UTF-8 character pair.)
>
> Given that no data is returned, isn't the problem more to do with the
> MATCHES operator not knowing what you mean than to do with
> UPPER(col_name) failing? At least, you should explain why you think
> that it is UPPER or LOWER (not illustrated) that is the trouble.
>
> What exactly do you have in your SQL in place of 'special_chars_value'?
> It is far from clear to me exactly which sequence of bytes you use -
> on the assumption that you don't mean 's', 'p', 'e', ... 'e'.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 09:46 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com