Unix Technical Forum

BUG #3697: utf8 issue: can not reimport a table that was successfully exported.

This is a discussion on BUG #3697: utf8 issue: can not reimport a table that was successfully exported. within the pgsql Bugs forums, part of the PostgreSQL category; --> The following bug has been logged online: Bug reference: 3697 Logged by: Marc Mamin Email address: m.mamin@intershop.de PostgreSQL version: ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Bugs

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-10-2008, 11:10 AM
Marc Mamin
 
Posts: n/a
Default BUG #3697: utf8 issue: can not reimport a table that was successfully exported.


The following bug has been logged online:

Bug reference: 3697
Logged by: Marc Mamin
Email address: m.mamin@intershop.de
PostgreSQL version: 8.2.4
Operating system: SuSE Linux 9.1 (i586)
Description: utf8 issue: can not reimport a table that was
successfully exported.
Details:

Hello,

I'm not sure this is a bug; the problem might be related to a client
encoding issue.


My Database is on a Linux server which I connect to using putty from
Windows.

both server and client are set to UTF8:

client_encoding | UTF8
backslash_quote | safe_encoding
server_encoding | UTF8


I stumbled on this issue while trying to import a "malicious" user agent
string...

I didn't check if all characters are valid UTF8...


My concern is about database recovery.
I'm using pg_dump to regulary export my users, bu according to the example
below,
it seems that my dumps may be worthless !





May be you should consider not to publish this before a fix exist
as this is a serious issue which could eventually be exploited
to damage existing instances (for the case this is really a bug)...

regards,

Marc Mamin



steps to repeat:


CREATE TABLE utf8_test(s varchar);


CREATE OR REPLACE FUNCTION f_utf8_test( st VARCHAR) RETURNS INT AS $$

DECLARE
quotedline varchar = quote_literal($1);

BEGIN
INSERT INTO utf8_test ( s ) VALUES ( quotedline);
RETURN 0;
END;

$$ LANGUAGE plpgsql;

select f_utf8_test('(Mozilla/4.0 (compatible; MSIE 6.0; Wind
\xE0\xF0\xF1\xF2\xE2\xE5\xED\xED\xFB\xE9 \xE2\xFB\xF1\xF8\9
\xE3\xEE\xF1\xF3\xE4
xE4\xE6 \xCD\xC1 \xD0\xC1")');


-- here the same statement, but with all backslashed duplicated for the case
when the string was modified when posting this issue:


select f_utf8_test('(Mozilla/4.0 (compatible; MSIE 6.0; Wind
\\xE0\\xF0\\xF1\\xF2\\xE2\\xE5\\xED\\xED\\xFB\\xE9 \\xE2\\xFB\\xF1\\xF8\\9
\\xE3\\xEE\\xF1\\xF3\\xE4
xE4\\xE6 \\xCD\\xC1 \\xD0\\xC1")');


WARNING: nonstandard use of escape in a string literal
LINE 1: select f_utf8_test('(Mozilla/4.0 (compatible; MSIE 6.0; Wind

COPY utf8_test TO '/tmp/utf8_test.txt';

COPY 1

COPY utf8_test FROM '/tmp/utf8_test.txt';

ERROR: invalid byte sequence for encoding "UTF8": 0xd3ce
HINT: This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by "client_encoding".
CONTEXT: COPY utf8_test, line 1


The same isuue occure when using pg_dump:


pg_dump -i -v -p 5433 -Uisdb2 -tutf8_test > /tmp/utf8_dump

pg_dump: server version: 8.2.4; pg_dump version: 8.2.1
pg_dump: proceeding despite version mismatch


psql -f"/tmp/utf8_dump"

=>
psql:/tmp/utf8_dump:40: ERROR: invalid byte sequence for encoding "UTF8":
0xd3ce
HINT: This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by "client_encoding".
CONTEXT: COPY utf8_test, line 1

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-10-2008, 11:11 AM
Tom Lane
 
Posts: n/a
Default Re: BUG #3697: utf8 issue: can not reimport a table that was successfully exported.

"Marc Mamin" <m.mamin@intershop.de> writes:
> I didn't check if all characters are valid UTF8...


They aren't ...

> select f_utf8_test('(Mozilla/4.0 (compatible; MSIE 6.0; Wind
> \xE0\xF0\xF1\xF2\xE2\xE5\xED\xED\xFB\xE9 \xE2\xFB\xF1\xF8\9
> \xE3\xEE\xF1\xF3\xE4
> xE4\xE6 \xCD\xC1 \xD0\xC1")');


In 8.3 that will throw an error:

utf8=# select f_utf8_test('(Mozilla/4.0 (compatible; MSIE 6.0; Wind
utf8'# \xE0\xF0\xF1\xF2\xE2\xE5\xED\xED\xFB\xE9 \xE2\xFB\xF1\xF8\9
utf8'# \xE3\xEE\xF1\xF3\xE4
utf8'# xE4\xE6 \xCD\xC1 \xD0\xC1")');
WARNING: nonstandard use of escape in a string literal
LINE 1: select f_utf8_test('(Mozilla/4.0 (compatible; MSIE 6.0; Wind
^
HINT: Use the escape string syntax for escapes, e.g., E'\r\n'.
ERROR: invalid byte sequence for encoding "UTF8": 0xe0f0f1
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
utf8=#

However, since this behavior isn't backwards-compatible, there's not
much appetite for back-patching it.

I don't think this is a security issue --- if you don't quote
backslashes in untrusted input you'll have problems far worse
than this one.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-10-2008, 11:11 AM
Tom Lane
 
Posts: n/a
Default Re: BUG #3697: utf8 issue: can not reimport a table that was successfully exported.

"Marc Mamin" <M.Mamin@intershop.de> writes:
> Is there a recommendation how to clean these data (I know where to
> search for them)


Usually people do it by running iconv with the delete-bad-data option
on a pg_dump file.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-10-2008, 11:11 AM
Marc Mamin
 
Posts: n/a
Default Re: BUG #3697: utf8 issue: can not reimport a table that was successfully exported.

Thank you for your quick response,

> if you don't quote backslashes in untrusted input you'll have problems

far worse than this one

I do it now but not since by db is live...
So I probably have some invalid caraters in.
Is this an issue that must be fixed before I can upgrade to 8.3 ?
Is there a recommendation how to clean these data (I know where to
search for them)

Thanks,

Marc Mamin




-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Thursday, October 25, 2007 6:08 PM
To: Marc Mamin
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #3697: utf8 issue: can not reimport a table that
was successfully exported.

"Marc Mamin" <m.mamin@intershop.de> writes:
> I didn't check if all characters are valid UTF8...


They aren't ...

> select f_utf8_test('(Mozilla/4.0 (compatible; MSIE 6.0; Wind
> \xE0\xF0\xF1\xF2\xE2\xE5\xED\xED\xFB\xE9 \xE2\xFB\xF1\xF8\9
> \xE3\xEE\xF1\xF3\xE4
> xE4\xE6 \xCD\xC1 \xD0\xC1")');


In 8.3 that will throw an error:

utf8=# select f_utf8_test('(Mozilla/4.0 (compatible; MSIE 6.0; Wind
utf8'# \xE0\xF0\xF1\xF2\xE2\xE5\xED\xED\xFB\xE9 \xE2\xFB\xF1\xF8\9
utf8'# \xE3\xEE\xF1\xF3\xE4 utf8'# xE4\xE6 \xCD\xC1 \xD0\xC1")');
WARNING: nonstandard use of escape in a string literal LINE 1: select
f_utf8_test('(Mozilla/4.0 (compatible; MSIE 6.0; Wind
^
HINT: Use the escape string syntax for escapes, e.g., E'\r\n'.
ERROR: invalid byte sequence for encoding "UTF8": 0xe0f0f1
HINT: This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".
utf8=#

However, since this behavior isn't backwards-compatible, there's not
much appetite for back-patching it.

I don't think this is a security issue --- if you don't quote
backslashes in untrusted input you'll have problems far worse than this
one.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 09:51 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com