This is a discussion on accented characters migraine within the pgsql Novice forums, part of the PostgreSQL category; --> It seems to me this ought to be simple and clearly documented but I've spent hours researching and experimenting ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| It seems to me this ought to be simple and clearly documented but I've spent hours researching and experimenting to no avail. PROBLEM: Entering accented characters in psql often results in the error: invalid byte sequence for encoding "UTF8" ENVIRONMENT: Client OS: Windows XP Keyboard: United States-International Terminal program: putty.exe, Translation: ISO-8859-1:1998 (Latin-1, West Europe) Server OS: Ubuntu Server client app: psql 8.2.4 Server db app: PostgreSQL 8.2.4 pg settings: client_encoding: UTF8 lc_collate: en_US.UTF-8 lc_ctype: en_US.UTF-8 server_encoding UTF8 initdb defaulted to UTF-8, which I need because I want ORDER BY to sort alphabetically, not by hex code. When I try to insert a string with an accented character, I generally get the above error. Simple example: template1=# \d sorttest id | integer test | text template1=# insert into sorttest (test) values ('ã'); ERROR: invalid byte sequence for encoding "UTF8": 0xe32729 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". The accented character (a-tilde) is entered from the Windows keyboard with the ~a sequence and displays properly in psql. The problem is that the server rejects it. Observations: 1) The Unicode hex value of a-tilde is C3 A3 but the error message says the invalid sequence is E3 27 29. I don't know what the first byte means but the second and third are the quote and right parenthesis characters following the a-tilde in my insert statement. 2) At various times, data entry as above has started working in a session but I can't figure out what I did to make it happen. 3) I tried entering the character in hex, as I understand it: insert into sorttest (test) values (E'\xc3\xa3'); This avoids the error but the string value then displays as the 2 seemingly irrelevant characters ã (A-tilde, British pound) It looks like I'm caught in some interaction between putty, psql and pg. The real problem is much more grave than just manual data entry-- I'm trying to migrate a large existing database from another pg server with: pg_dumpall -h nnn.nnn.nnn.nnn | psql This throws errors each time the COPY commands encounter an accented character in the dump. Any ideas? Is this just a bonehead mistake on my part? John ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Fri, 2007-10-12 at 11:19 -0500, Wright, George wrote: > Putty is showing ISO-8858-1 which is Latin. I believe both client and > server must be UTF-8. > > > > -----Original Message----- > ENVIRONMENT: > Terminal program: putty.exe, Translation: ISO-8859-1:1998 (Latin-1, West > Europe) > pg settings: > client_encoding: UTF8 > lc_collate: en_US.UTF-8 > lc_ctype: en_US.UTF-8 > server_encoding UTF8 > > initdb defaulted to UTF-8, which I need because I want ORDER BY to sort > alphabetically, not by hex code. I think you need to set your client encoding to Latin. At the moment it thinks the client encoding is Unicode but you are feeding it Latin characters. The command you need is: set client_encoding to 'latin1'; Alternatively, set putty to output UTF-8, if it can do that. Sorting is done by the backend, and the database is UTF-8, so you don't need to change anything there. -- Oliver Elphick olly@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver GPG: 1024D/A54310EA 92C8 39E7 280E 3631 3F0E 1EC0 5664 7A2F A543 10EA ======================================== Do you want to know God? http://www.lfix.co.uk/knowing_god.html -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| John Gunther <postgresql@bucksvsbytes.com> writes: > PROBLEM: Entering accented characters in psql often results in the > error: invalid byte sequence for encoding "UTF8" > ENVIRONMENT: > Terminal program: putty.exe, Translation: ISO-8859-1:1998 (Latin-1, West > Europe) > client_encoding: UTF8 Those two settings need to match up. PG can convert from latin1 to utf8, but only when you tell it the truth about what encoding the client is using. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| ||||
| Just out of curiously, why would you ever want to do that? (I am not saying your method is wrong, but SQL should be ample, no?) On Mon, Feb 18, 2008 at 9:15 AM, John Gunther <postgresql@bucksvsbytes.com> wrote: > Thanks, Andreas. That was easy. I thought I new all the value expression > types, but following your clue, I've learned the last two, ARRAY() and > ROW(), from Section 4.2. > > John > > > A. Kretschmer wrote: > > am Mon, dem 18.02.2008, um 8:22:14 -0500 mailte John Gunther folgendes: > > > What's the most straightforward way to populate an array from a select > statement? For example, using a fictional extension of SQL syntax, I'd like: > > update users set emails=ARRAY[select email from address where userid=25] > where id=25; > > So if user 25 has emails john@domain.com, john@gmail.com, and john@yahoo.com in the address table, > select emails from user where id=25; > will return: > emails > ---------------------------------------------------------------- > {john@domain.com, john@gmail.com, and john@yahoo.com} > > > You can use array_to_string() and array(). > > Example: > > test=*# select * from mail_adr ; > id | email > ----+--------------- > 2 | foo@bar > 2 | bar@batz > 2 | foobar@barfoo > (3 rows) > > test=*# select array_to_string(array(select email from mail_Adr where id=2), ', '); > array_to_string > ---------------------------------- > foo@bar, bar@batz, foobar@barfoo > (1 row) > > > Andreas > > > > |