vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi PostgreSQL developers, I'm currently hunting down the last postgresql-common test case failure that I see with 8.3beta1. It seems the 8.3 version of libpq changes some internal encoding lists? If I use the 8.2 programs with the 8.2 library, all is well: $ LC_ALL=en_US.UTF-8 /usr/lib/postgresql/8.2/bin/initdb --encoding UTF8-D /tmp/x [...] The database cluster will be initialized with locale en_US.UTF-8. [...] $ /usr/lib/postgresql/8.2/bin/postgres -D /tmp/x -k /tmp & $ /usr/lib/postgresql/8.2/bin/psql -Alth /tmp postgres|martin|UTF8 template0|martin|UTF8 template1|martin|UTF8 However, if I use 8.2 programs with the 8.3 library, things start to become weird: $ # kill postgres instance $ rm -rf /tmp/x; LC_ALL=en_US.UTF-8 /usr/lib/postgresql/8.2/bin/initdb --encoding UTF8 -D /tmp/x [...] The database cluster will be initialized with locale en_US.UTF-8. initdb: warning: encoding mismatch The encoding you selected (UTF8) and the encoding that the selected locale uses (UTF-8) are not known to match. This may lead to misbehavior in various character string processing functions. To fix this situation, rerun initdb and either do not specify an encoding explicitly, or choose a matching combination. [...] $ /usr/lib/postgresql/8.2/bin/postgres -D /tmp/x -k /tmp & $ /usr/lib/postgresql/8.2/bin/psql -Alth /tmp postgres|martin|JOHAB template0|martin|JOHAB template1|martin|JOHAB In the latter configuration, when I do not explicitly specify an encoding, the initdb output still looks weird, but at least the result seems to be correct: $ rm -rf /tmp/x; LC_ALL=en_US.UTF-8 /usr/lib/postgresql/8.2/bin/initdb -D /tmp/x [...] The database cluster will be initialized with locale en_US.UTF-8. The default database encoding has accordingly been set to MULE_INTERNAL. [...] $ /usr/lib/postgresql/8.2/bin/postgres -D /tmp/x -k /tmp & $ /usr/lib/postgresql/8.2/bin/psql -Alth /tmp postgres|martin|UTF8 template0|martin|UTF8 template1|martin|UTF8 This is a bit unfortunate, since it breaks ABI compatibility without announcing it in the SONAME. From the previous discussion it is quite clear that a soname bump is a pain, so could this be changed somehow to accomodate new encodings while remaining binary compatibility with earlier releases? Thanks, Martin -- Martin Pitt http://www.piware.de Ubuntu Developer http://www.ubuntu.com Debian Developer http://www.debian.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHD4XCDecnbV4Fd/IRApXEAKD8kfAwOnMk1y2qcPBFZPu1SI4kpgCfaH6Y z9nt089OVQ5vkNx6Xi1C08o= =yfNo -----END PGP SIGNATURE----- |
| |||
| Hi, Martin Pitt [2007-10-12 16:33 +0200]: > I'm currently hunting down the last postgresql-common test case > failure that I see with 8.3beta1. It seems the 8.3 version of libpq > changes some internal encoding lists? Ah, got it. The ordering in pg_enc2name_tbl[] changed, which makes the indices jump around. This was introduced in [1], in particular in those two bits: http://developer.postgresql.org/cvsw...1=1.71;r2=1.72 http://developer.postgresql.org/cvsw...1=1.32;r2=1.33 With attached patch (which restores the previous ordering) compatibility with 8.2 is restored. This has two drawbacks: * The enum cannot be nicely sorted by internal and client-only encodings until libpq bumps soname again. This is only a cosmetical problem, though. * This patch needs another catalog bump (to "unbump" the one in [1]). That's unfortunate, but the catalog number got bumped in between beta and release in earlier versions, too, so I hope it's not too bad. The pg_enc2name_tbl declaration should probably have a comment saying to never alter the order, but only append new stuff at the end. For encodings which became obsolete (should that happen) there should be an constant like "INVALID" or "DEPRECATED". Thank you! Martin [1] http://archives.postgresql.org/pgsql...4/msg00198.php -- Martin Pitt http://www.piware.de Ubuntu Developer http://www.ubuntu.com Debian Developer http://www.debian.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHD49mDecnbV4Fd/IRAqgHAJ9UoBK4QCPFKkc2cpwdvRJ3WcLF5wCfRH6D gFsd7S0cM9TQybttG4iMATo= =yj+C -----END PGP SIGNATURE----- |
| |||
| Martin Pitt <martin@piware.de> writes: > Ah, got it. The ordering in pg_enc2name_tbl[] changed, which makes the > indices jump around. Sorry, you don't get to put JOHAB back into the portion of the list that is backend-legal encodings. It's a bit nasty that this enum is exposed as part of the ABI, but I'm afraid we may be stuck with that decision. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Martin Pitt <martin@piware.de> writes: > However, if I use 8.2 programs with the 8.3 library, things start to > become weird: > $ # kill postgres instance > $ rm -rf /tmp/x; LC_ALL=3Den_US.UTF-8 /usr/lib/postgresql/8.2/bin/initdb = > --encoding UTF8 -D /tmp/x Does anything other than initdb get weird? For the most part I believe it's the case that libpq's idea of the enum values is independent of the backend's. I think the issue here is that initdb is (mis) using libpq's pg_char_to_encoding, etc, and combining those functions with its own idea of the meanings of the enum values. Maybe we should stop exporting pg_char_to_encoding and so on from libpq, though I wonder if that would break any clients. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Hi, Tom Lane [2007-10-12 11:50 -0400]: > Martin Pitt <martin@piware.de> writes: > > Ah, got it. The ordering in pg_enc2name_tbl[] changed, which makes the > > indices jump around. > > Sorry, you don't get to put JOHAB back into the portion of the list that > is backend-legal encodings. Ah, the PG_ENCODING_BE_LAST magic. > It's a bit nasty that this enum is exposed as part of the ABI, but I'm > afraid we may be stuck with that decision. Well, then I see two options for 8.3: (1) Change the PG_ENCODING_IS_CLIENT_ONLY and PG_VALID_BE_ENCODING macros to expliticy disallow encodings which have become client-only while soname is not bumped. This is a bit ugly, but should work until the table gets restructured to have a per-locale flag of internal/clientonly, or the mapping stops being index-based. I'm happy to check all 9 other places where pg_enc is used for whether they need adaptions for dropped JOHAB (i. e. make assumptions about the structure without using above macros). (2) Bump the soname. That's definitively a huge PITA for distributors, but it's still better than silently breaking the ABI. So, with my distro hat on I'd definitively prefer (1), but if you want (2) for cleanliness' sake, we have to follow and bite the bullet. But we can't just let it stay like this. Thank you, and have a good weekend! Martin -- Martin Pitt http://www.piware.de Ubuntu Developer http://www.ubuntu.com Debian Developer http://www.debian.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHD55lDecnbV4Fd/IRAjPkAJ9w3jUHsZCj+oa05Xo25dtpbhK53wCeN7LI bG2U+O4vH87IsO5kWmc4H/s= =sDfJ -----END PGP SIGNATURE----- |
| |||
| Am Freitag, 12. Oktober 2007 schrieb Tom Lane: > Sorry, you don't get to put JOHAB back into the portion of the list that > is backend-legal encodings. > > It's a bit nasty that this enum is exposed as part of the ABI, but I'm > afraid we may be stuck with that decision. Perhaps we should leave a gap where JOHAB used to be and add it at the end. Otherwise we may have to do a soname exercise. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Peter Eisentraut <peter_e@gmx.net> writes: > Perhaps we should leave a gap where JOHAB used to be and add it at the end. > Otherwise we may have to do a soname exercise. The question to me is whether the encoding numbers used by libpq should be considered anything other than black-box numbers. initdb is arguably assuming more than it ought to about what they mean. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Tom Lane wrote: > Martin Pitt <martin@piware.de> writes: >> However, if I use 8.2 programs with the 8.3 library, things start to >> become weird: > >> $ # kill postgres instance >> $ rm -rf /tmp/x; LC_ALL=3Den_US.UTF-8 /usr/lib/postgresql/8.2/bin/initdb = >> --encoding UTF8 -D /tmp/x > > Does anything other than initdb get weird? > > For the most part I believe it's the case that libpq's idea of the enum > values is independent of the backend's. I think the issue here is that > initdb is (mis) using libpq's pg_char_to_encoding, etc, and combining > those functions with its own idea of the meanings of the enum values. > > Maybe we should stop exporting pg_char_to_encoding and so on from libpq, > though I wonder if that would break any clients. I'm in favor not exporting them in the long term, a normal client program should have no business calling those functions. Note that we've also added PG_EUC_JIS_2004 and PG_SJIS (client-only) to the enumeration. That means that all the previous client-only encodings have been shifted by two. If we want to keep the functions compatible when possible, we could: * replace JOHAB in the enum with a deprecated placeholder, and * move PG_JOHAB to the end of the enum, instead of the shoving it in the middle of client-only encodings, and * move PG_SJIS to the end, to shift the rest of the client-only encodings by one, to compensate the addition of the new PG_EUC_JIS_2004 encoding. PG_JOHAB and PG_SJIS would have different encoding identifiers than in 8.2, but the rest would stay put. But OTOH, I feel we should just stop exporting them now; I don't think there's a legitimate use case for a client application to use them. The best I can think of is an application that would want to know what encodings there is, and show them in a menu or something. But even then, you shouldn't use the numeric values, just the encoding names. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Hi, Tom Lane [2007-10-12 12:02 -0400]: > Does anything other than initdb get weird? It's hard to tell, my test suite concentrates on hammering initdbs with various locales, encodings, getting a chain of 7.4->8.{0,1,2,3} upgrades and testing the conversion of postgresql.conf arguments, etc. I do not do that much of locale juggling (only some particular tests to check for the infamous CVE-2006-2313/4). I'm just afraid there might be other lurking regressions. I can do some tests with psql and set client_encoding, etc. > For the most part I believe it's the case that libpq's idea of the enum > values is independent of the backend's. I think the issue here is that > initdb is (mis) using libpq's pg_char_to_encoding, etc, and combining > those functions with its own idea of the meanings of the enum values. > > Maybe we should stop exporting pg_char_to_encoding and so on from libpq, > though I wonder if that would break any clients. Hm, at least that sounds like a good method to find out what other parts of the code use this array directly. Thanks, Martin -- Martin Pitt http://www.piware.de Ubuntu Developer http://www.ubuntu.com Debian Developer http://www.debian.org ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| ||||
| Heikki Linnakangas wrote: > Note that we've also added PG_EUC_JIS_2004 and PG_SJIS (client-only) to > the enumeration. That means that all the previous client-only encodings > have been shifted by two. On closer look, PG_SJIS is not new, PG_SHIFT_JIS-2004 is. But that was added to the end, so the above should be "shifted by one", not two. The rest of the mail got that right. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |