This is a discussion on Selecting on non ASCII varchars within the pgsql Interfaces jdbc forums, part of the PostgreSQL category; --> Hi, I have a unicode database. Inserting unicode strings works fine. Selecting data based on int columns works fine ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi, I have a unicode database. Inserting unicode strings works fine. Selecting data based on int columns works fine too. However, I am unable to select based on varchar columns when the select contains non ascii characters. the same select will work in Aqua Data Studio, just not from java. Am i setting up my connections or prepared statements wrong? /* begin example code */ javax.naming.InitialContext ctx = new javax.naming.InitialContext(); javax.sql.DataSource ref1 = (javax.sql.DataSource)ctx.lookup("java:/ PostgresDS"); Connection conn = ref1.getConnection(); PreparedStatement pst = conn.prepareStatement("SELECT * from mytable m where m.title ~* ?"); pst.setString(1, myString); ResultSet rs = pst.executeQuery(); /* end example code */ mytable.title is a varchar(300) myString is a java.lang.String which was loaded from a unicode xml stream. whenever myString contains accented or chinese characters, for example, the result set will be empty even though there are records in the database that should match. doing the same query manually in aqua data studio works fine. I'm using postgres 8.0.3 Any ideas? -Jeremy ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Jeremy LaCivita wrote: > PreparedStatement pst = conn.prepareStatement("SELECT * from mytable m > where m.title ~* ?"); If you use direct equality (=), does it work? There have been comments on pgsql-bugs recently that some areas of the backend code (case insensitive comparison and regexp) do not work correctly in all cases when multibyte encodings are used. You might want to repost to -bugs if basic equality works correctly. Do you have a selfcontained testcase we can try? In particular we need to know the actual column values and regexp patterns you have problems with. -O ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Tuesday 04 October 2005 16:16, Jeremy LaCivita wrote: > Hmmm > > so it turns out if i take all my Strings and do this: > > str = new String(str.getBytes(), "utf-8"); > > then it works. > > Correct me if i'm wrong, but that says to me that the Strings were > in UTF-8 already, but Java didn't know it, so it couldn't send them > to postgres properly. It's meaningless to ask what encoding a String has. String are sequence of chars -- they don't have an encoding. The notion of "encoding" comes into play only when you have to represent a String as a sequence of bytes. So, if this returns true for you: str.equals(new String(str.getBytes(), "utf-8")); that means your default encoding is either utf-8 or a subset of utf-8, at least for the characters found in str. String#getBytes() uses the default encoding which may be specified via the environment variable LANG on on Unix-like systems. So, if my default encoding is UTF-8, I get this: | $ echo $LANG | en_US.UTF-8 | $ bsh2 | BeanShell 2.0-0.b1.7jpp - by Pat Niemeyer (pat@pat.net) | bsh % print(System.getProperty("file.encoding")); | UTF-8 | bsh % str = "Funny char: \u00e8"; | bsh % print(str); | Funny char: è | bsh % print(str.equals(new String(str.getBytes(), "utf-8"))); | true | bsh % If I change the default encoding to ISO-8859-1, I get this: | $ env LANG=en_US.iso88591 bsh2 | BeanShell 2.0-0.b1.7jpp - by Pat Niemeyer (pat@pat.net) | bsh % print(System.getProperty("file.encoding")); | ISO-8859-1 | bsh % str = "Funny char: \u00e8"; | bsh % print(str); | Funny char: è | bsh % print(str.equals(new String(str.getBytes(), "utf-8"))); | false | bsh % ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| ||||
| Vadim Nasardinov <vadimn@redhat.com> writes: > On Tuesday 04 October 2005 16:16, Jeremy LaCivita wrote: >> Correct me if i'm wrong, but that says to me that the Strings were >> in UTF-8 already, but Java didn't know it, so it couldn't send them >> to postgres properly. > > It's meaningless to ask what encoding a String has. String are > sequence of chars -- they don't have an encoding. Actually they are encoded using UTF-16 <http://java.sun.com/developer/technicalArticles/Intl/Supplementary/> Granted, this is the no-brainer "same value" encoding... as long as codepoint < U+FFFF ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| Thread Tools | |
| Display Modes | |
|
|