Unix Technical Forum

accented characters migraine

This is a discussion on accented characters migraine within the pgsql Novice forums, part of the PostgreSQL category; --> It seems to me this ought to be simple and clearly documented but I've spent hours researching and experimenting ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Novice

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-17-2008, 10:50 PM
John Gunther
 
Posts: n/a
Default accented characters migraine

It seems to me this ought to be simple and clearly documented but I've
spent hours researching and experimenting to no avail.

PROBLEM: Entering accented characters in psql often results in the
error: invalid byte sequence for encoding "UTF8"

ENVIRONMENT:
Client OS: Windows XP
Keyboard: United States-International
Terminal program: putty.exe, Translation: ISO-8859-1:1998 (Latin-1, West
Europe)
Server OS: Ubuntu
Server client app: psql 8.2.4
Server db app: PostgreSQL 8.2.4
pg settings:
client_encoding: UTF8
lc_collate: en_US.UTF-8
lc_ctype: en_US.UTF-8
server_encoding UTF8

initdb defaulted to UTF-8, which I need because I want ORDER BY to sort
alphabetically, not by hex code.

When I try to insert a string with an accented character, I generally
get the above error. Simple example:
template1=# \d sorttest
id | integer
test | text

template1=# insert into sorttest (test) values ('ã');
ERROR: invalid byte sequence for encoding "UTF8": 0xe32729
HINT: This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".

The accented character (a-tilde) is entered from the Windows keyboard
with the ~a sequence and displays properly in psql. The problem is that
the server rejects it.
Observations:
1) The Unicode hex value of a-tilde is C3 A3 but the error message says
the invalid sequence is E3 27 29. I don't know what the first byte means
but the second and third are the quote and right parenthesis characters
following the a-tilde in my insert statement.
2) At various times, data entry as above has started working in a
session but I can't figure out what I did to make it happen.
3) I tried entering the character in hex, as I understand it: insert
into sorttest (test) values (E'\xc3\xa3');
This avoids the error but the string value then displays as the 2
seemingly irrelevant characters ã (A-tilde, British pound)

It looks like I'm caught in some interaction between putty, psql and pg.
The real problem is much more grave than just manual data entry-- I'm
trying to migrate a large existing database from another pg server with:
pg_dumpall -h nnn.nnn.nnn.nnn | psql
This throws errors each time the COPY commands encounter an accented
character in the dump.

Any ideas? Is this just a bonehead mistake on my part?

John


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-17-2008, 10:50 PM
Oliver Elphick
 
Posts: n/a
Default Re: accented characters migraine

On Fri, 2007-10-12 at 11:19 -0500, Wright, George wrote:
> Putty is showing ISO-8858-1 which is Latin. I believe both client and
> server must be UTF-8.
>
>
>
> -----Original Message-----
> ENVIRONMENT:
> Terminal program: putty.exe, Translation: ISO-8859-1:1998 (Latin-1, West
> Europe)
> pg settings:
> client_encoding: UTF8
> lc_collate: en_US.UTF-8
> lc_ctype: en_US.UTF-8
> server_encoding UTF8
>
> initdb defaulted to UTF-8, which I need because I want ORDER BY to sort
> alphabetically, not by hex code.


I think you need to set your client encoding to Latin. At the moment it
thinks the client encoding is Unicode but you are feeding it Latin
characters.

The command you need is:

set client_encoding to 'latin1';

Alternatively, set putty to output UTF-8, if it can do that.

Sorting is done by the backend, and the database is UTF-8, so you don't
need to change anything there.

--
Oliver Elphick olly@lfix.co.uk
Isle of Wight http://www.lfix.co.uk/oliver
GPG: 1024D/A54310EA 92C8 39E7 280E 3631 3F0E 1EC0 5664 7A2F A543 10EA
========================================
Do you want to know God? http://www.lfix.co.uk/knowing_god.html


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-17-2008, 10:50 PM
Tom Lane
 
Posts: n/a
Default Re: accented characters migraine

John Gunther <postgresql@bucksvsbytes.com> writes:
> PROBLEM: Entering accented characters in psql often results in the
> error: invalid byte sequence for encoding "UTF8"


> ENVIRONMENT:
> Terminal program: putty.exe, Translation: ISO-8859-1:1998 (Latin-1, West
> Europe)
> client_encoding: UTF8


Those two settings need to match up. PG can convert from latin1 to
utf8, but only when you tell it the truth about what encoding the client
is using.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-17-2008, 10:51 PM
Mag Gam
 
Posts: n/a
Default Re: Populating an array from a select statement

Just out of curiously, why would you ever want to do that? (I am not saying
your method is wrong, but SQL should be ample, no?)


On Mon, Feb 18, 2008 at 9:15 AM, John Gunther <postgresql@bucksvsbytes.com>
wrote:

> Thanks, Andreas. That was easy. I thought I new all the value expression
> types, but following your clue, I've learned the last two, ARRAY() and
> ROW(), from Section 4.2.
>
> John
>
>
> A. Kretschmer wrote:
>
> am Mon, dem 18.02.2008, um 8:22:14 -0500 mailte John Gunther folgendes:
>
>
> What's the most straightforward way to populate an array from a select
> statement? For example, using a fictional extension of SQL syntax, I'd like:
>
> update users set emails=ARRAY[select email from address where userid=25]
> where id=25;
>
> So if user 25 has emails john@domain.com, john@gmail.com, and john@yahoo.com in the address table,
> select emails from user where id=25;
> will return:
> emails
> ----------------------------------------------------------------
> {john@domain.com, john@gmail.com, and john@yahoo.com}
>
>
> You can use array_to_string() and array().
>
> Example:
>
> test=*# select * from mail_adr ;
> id | email
> ----+---------------
> 2 | foo@bar
> 2 | bar@batz
> 2 | foobar@barfoo
> (3 rows)
>
> test=*# select array_to_string(array(select email from mail_Adr where id=2), ', ');
> array_to_string
> ----------------------------------
> foo@bar, bar@batz, foobar@barfoo
> (1 row)
>
>
> Andreas
>
>
>
>


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 03:33 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com