vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| The following bug has been logged online: Bug reference: 2685 Logged by: Sergiy Vyshnevetskiy Email address: serg@vostok.net PostgreSQL version: 8.1 Operating system: FreeBSD-6 stable Description: Wrong charset of server messages on client [PATCH] Details: DESCRIPTION: PostgreSQL backend uses gettext() to localize its messages. The charset of localized messages is determined by LC_CTYPE by default. Then the message is processed through sprintf-like mechanism (with database data as possible arguments) and fed to send_message_to_frontend(), that converts data from _database_charset_(!) to client charset. If LC_CTYPE is not the same as (at least binary compatible to) database charset, then client gets garbage characters in server messages. If database charset is UTF-8, then cluster may recusively generate "invalid byte sequence for encoding" errors till it fills up errordata[ERRORDATA_STACK_SIZE], then it panics. SOLUTION: Convert server messages to database charset. PATCH: --- src/backend/utils/mb/mbutils.c.o0 Tue Oct 10 11:51:13 2006 +++ src/backend/utils/mb/mbutils.c Tue Oct 10 11:49:22 2006 @@ -615,6 +615,7 @@ DatabaseEncoding = &pg_enc2name_tbl[encoding]; Assert(DatabaseEncoding->encoding == encoding); #ifdef USE_ICU + bind_textdomain_codeset("postgres",(&pg_enc2ianana me_tbl[encoding])->name); ucnv_setDefaultName((&pg_enc2iananame_tbl[encoding])->name); #endif } This, however, uncovers another bug: PostgreSQL dumps the messages into stderr/syslog as-is, without converting database data from database charset to charset from LC_MESSAGES. After this patch it will do so with message text too. The fix should be trivial - set up a conversion from database charset to server charset. I will post a patch for it later. NOTE: I used pg_enc2iananame_tbl instead of pg_enc2name_tbl, because gettext doesn't accept many Possible TODO: Change PostgreSQL charset names to IANA-standard names. ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| "Sergiy Vyshnevetskiy" <serg@vostok.net> writes: > Convert server messages to database charset. This has been discussed before: http://archives.postgresql.org/pgsql...8/msg00245.php The magic pg_enc2iananame_tbl[] you reference in your patch does not exist, and if it did exist it wouldn't work on all platforms, since encoding names aren't sufficiently well standardized :-( > This, however, uncovers another bug: PostgreSQL dumps the messages into > stderr/syslog as-is, without converting database data from database charset > to charset from LC_MESSAGES. I'm quite unconvinced that that's a bug. If we tried to do a conversion here, it would be trivial to set up denials of service for logging --- just include a character in a comment in your SQL command that cannot be converted to the LC_MESSAGES character set. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| On Tue, 10 Oct 2006, Sergiy Vyshnevetskiy wrote: > On Tue, 10 Oct 2006, Tom Lane wrote: > >> "Sergiy Vyshnevetskiy" <serg@vostok.net> writes: >> > Convert server messages to database charset. >> >> This has been discussed before: >> http://archives.postgresql.org/pgsql...8/msg00245.php >> >> The magic pg_enc2iananame_tbl[] you reference in your patch does not >> exist, >> and if it did exist it wouldn't work on all platforms, since encoding >> names aren't sufficiently well standardized :-( > > It's not magic, it's from ICU patch. Want me to send you a copy? Sorry. I thought it was more well-known. Just looked into gentoo portage - they don't know about it eigther. The patch is here: http://people.freebsd.org/~girgen/po...-09-25.diff.gz This is the current list of encodings, according to iana: http://www.iana.org/assignments/character-sets ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On Tue, 10 Oct 2006, Sergiy Vyshnevetskiy wrote: > On Tue, 10 Oct 2006, Sergiy Vyshnevetskiy wrote: > >> On Tue, 10 Oct 2006, Tom Lane wrote: >> >> > "Sergiy Vyshnevetskiy" <serg@vostok.net> writes: >> > > Convert server messages to database charset. >> > >> > This has been discussed before: >> > http://archives.postgresql.org/pgsql...8/msg00245.php >> > >> > The magic pg_enc2iananame_tbl[] you reference in your patch does not >> > exist, >> > and if it did exist it wouldn't work on all platforms, since encoding >> > names aren't sufficiently well standardized :-( >> >> It's not magic, it's from ICU patch. Want me to send you a copy? > > Sorry. I thought it was more well-known. Just looked into gentoo portage - > they don't know about it eigther. > > The patch is here: > > http://people.freebsd.org/~girgen/po...-09-25.diff.gz > > This is the current list of encodings, according to iana: > > http://www.iana.org/assignments/character-sets ICU homepage is http://icu.sourceforge.net/ ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Sergiy Vyshnevetskiy <serg@vostok.net> writes: >> It's not magic, it's from ICU patch. Want me to send you a copy? You're missing my point, which is that non-ICU locale support doesn't necessarily recognize the same encoding names. We would have done this years ago if we had a solution to that problem. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Tue, 10 Oct 2006, Tom Lane wrote: > "Sergiy Vyshnevetskiy" <serg@vostok.net> writes: >> Convert server messages to database charset. > > This has been discussed before: > http://archives.postgresql.org/pgsql...8/msg00245.php > > The magic pg_enc2iananame_tbl[] you reference in your patch does not exist, > and if it did exist it wouldn't work on all platforms, since encoding > names aren't sufficiently well standardized :-( It's not magic, it's from ICU patch. Want me to send you a copy? >> This, however, uncovers another bug: PostgreSQL dumps the messages into >> stderr/syslog as-is, without converting database data from database charset >> to charset from LC_MESSAGES. > > I'm quite unconvinced that that's a bug. If we tried to do a conversion > here, it would be trivial to set up denials of service for logging --- > just include a character in a comment in your SQL command that cannot be > converted to the LC_MESSAGES character set. They have to be printed as escape sequences. I think that dumping raw string data in log without converting them to printable form can be used to mess up log viewer at least. (At most this can be a security breach.) Having row multibyte characters mixed with characters in LC_CTYPE in the log makes it less useful. Syslog would mangle them further to a complete unrecognition. ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| ||||
| On Tue, 10 Oct 2006, Tom Lane wrote: > Sergiy Vyshnevetskiy <serg@vostok.net> writes: >>> It's not magic, it's from ICU patch. Want me to send you a copy? > > You're missing my point, which is that non-ICU locale support doesn't > necessarily recognize the same encoding names. We would have done this > years ago if we had a solution to that problem. We should use IANA-standard names. If it fails - it does nothing. Anybody porting PostgreSQL to new platform can go over the list and make a patch for their port. ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |