This is a discussion on What's a good default encoding? within the Pgsql General forums, part of the PostgreSQL category; --> If you're going to be putting emdashes, letters with lines and circles above them, and similar stuff that's mostly ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| If you're going to be putting emdashes, letters with lines and circles above them, and similar stuff that's mostly European and American in a database, what's a good default encoding to use - UTF-8? CSN __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Mar 14, 2006, at 3:10 PM, CSN wrote: > If you're going to be putting emdashes, letters with > lines and circles above them, and similar stuff that's > mostly European and American in a database, what's a > good default encoding to use - UTF-8? Yes, UTF-8 is good because it can represent every possible character and is efficient with respect to latin character sets. John DeSoi, Ph.D. http://pgedit.com/ Power Tools for PostgreSQL ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Perhaps others can comment on encoding versus type of data. I would add that the manner in which data are accessed may also be a consideration. Specifically, UTF-8 is a good choice if one is going to use JDBC. Michael Schmidt |
| |||
| I am wondering if somebody here can tell me the difference between UTF-8 and SQL-ASCII, whether there are any benefits of converting SQL-ASCII to UTF-8? If so, under what circumstances do we want to convert to UTF-8? Thanks, On 3/15/06, Michael Schmidt <michaelmschmidt@msn.com> wrote: > > Perhaps others can comment on encoding versus type of data. I would add > that the manner in which data are accessed may also be a consideration. > Specifically, UTF-8 is a good choice if one is going to use JDBC. > > Michael Schmidt > |
| |||
| "Junaili Lie" <junaili@gmail.com> writes: > I am wondering if somebody here can tell me the difference between > UTF-8 and SQL-ASCII, whether there are any benefits of converting > SQL-ASCII to UTF-8? SQL_ASCII isn't really an encoding; it's more like a declaration of ignorance. If the encoding is set to SQL_ASCII, the backend will store any high-bit-on data you send it, and return it without any sort of conversion. If you are dealing with data beyond the 7-bit ASCII set, it's probably a really bad idea to be using the SQL_ASCII setting, because the database won't give you any help at all in checking for bad data or converting between the encodings wanted by different client programs. There are a few situations where this is what you want, but I think most people are better off picking a specific encoding. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| On Thu, Mar 16, 2006 at 07:30:36AM +0100, Harald Armin Massa wrote: > Good default encoding: > > does somebody NOT agree that UTF8 is quite a recommendation, at least for > all the people without Korean, Japanese and Chinese Chars? I know, that'sat > maximum 2/3 of our potential user base, but better then nothing. Umm, you should choose an encoding supported by your platform and the locales you use. For example, UTF-8 is a bad choice on *BSD because there is no collation support for UTF-8 on those platforms. On Linux/Glibc UTF-8 is well supported but you need to make sure the locale you initdb with is a UTF-8 locale. By and large postgres correctly autodetects the encoding from the locale. > Maybe we could even "suggest" UTF8 in the "getting started" (i.e. the > windows installer initdb screen, or other default installations) Sth. like > "if you do not know better, take utf8" UTF-8 on windows works pretty well. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFEGSOrIB7bNG8LQkwRAgMgAJ9zpGRAR6hKIjgUd0uOs8 cdGr2ZtQCeMNs9 j5zecouX68x8jo08HF92T/A= =uDL1 -----END PGP SIGNATURE----- |
| |||
| I tried changing my database to UTF8 and then importing the dump (even tried iconv). It choked (on an accented e). Then somehow the database got created as LATIN9, and I was able to import successfully. I guess if it works, I'll be leaving it alone for the time being. I still have problems when emdashes are stored in the database as HTML entities, but they're displayed as emdashes in a web form, but then get stored back in the database wrong when edited (an accented A IIRC). I dunno - maybe it's a browser or Rails thing. CSN __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On Thu, Mar 16, 2006 at 03:11:27AM -0800, CSN wrote: > I tried changing my database to UTF8 and then > importing the dump (even tried iconv). It choked (on > an accented e). Then somehow the database got created > as LATIN9, and I was able to import successfully. I > guess if it works, I'll be leaving it alone for the > time being. Note, when you create a dump, pg_dump adds a "set client_encoding" at the top of the dump. If you change the encoding using iconv without changing that line, you'll get problems in the import. In theory dumping from a latin9 database into a utf8 one should Just Work(tm) because PostgreSQL will convert the data while loading. > I still have problems when emdashes are stored in the > database as HTML entities, but they're displayed as > emdashes in a web form, but then get stored back in > the database wrong when edited (an accented A IIRC). I > dunno - maybe it's a browser or Rails thing. I think the stuff submitted by the browser has a given encoding and won't be encoded with HTML entities. Converting unicode to HTML entities has to be done somewhere there... Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFEGUrvIB7bNG8LQkwRAkIgAJwIC0+PHOyPnOm6yGp1aV mkzB5cOwCePgcf DHD8SdWKTOUyafX3HbKsR9w= =4Ohb -----END PGP SIGNATURE----- |
| |||
| On Mar 16, 2006, at 3:36 AM, Martijn van Oosterhout wrote: > Umm, you should choose an encoding supported by your platform and the > locales you use. For example, UTF-8 is a bad choice on *BSD because > there is no collation support for UTF-8 on those platforms. On > Linux/Glibc UTF-8 is well supported but you need to make sure the Shouldn't postgres be providing the collating routines for UTF8 anyhow? How else can we guarantee identical behavior across platforms? ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| ||||
| On Mar 20, 2006, at 6:04 PM, Peter Eisentraut wrote: > Vivek Khera wrote: >> Shouldn't postgres be providing the collating routines for UTF8 >> anyhow? > > Start typing ... So, if I use a UTF8 encoded DB on FreeBSD, all hell will break loose or what? Will things not compare correctly? Where from does the code to do the collating come, then? ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |