Unix Technical Forum

What's a good default encoding?

This is a discussion on What's a good default encoding? within the Pgsql General forums, part of the PostgreSQL category; --> If you're going to be putting emdashes, letters with lines and circles above them, and similar stuff that's mostly ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql General

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-09-2008, 09:45 AM
CSN
 
Posts: n/a
Default What's a good default encoding?

If you're going to be putting emdashes, letters with
lines and circles above them, and similar stuff that's
mostly European and American in a database, what's a
good default encoding to use - UTF-8?

CSN

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-09-2008, 09:45 AM
John DeSoi
 
Posts: n/a
Default Re: What's a good default encoding?


On Mar 14, 2006, at 3:10 PM, CSN wrote:

> If you're going to be putting emdashes, letters with
> lines and circles above them, and similar stuff that's
> mostly European and American in a database, what's a
> good default encoding to use - UTF-8?



Yes, UTF-8 is good because it can represent every possible character
and is efficient with respect to latin character sets.




John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-09-2008, 09:46 AM
Michael Schmidt
 
Posts: n/a
Default Re: What's a good default encoding?

Perhaps others can comment on encoding versus type of data. I would add that the manner in which data are accessed may also be a consideration. Specifically, UTF-8 is a good choice if one is going to use JDBC.

Michael Schmidt
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-09-2008, 09:47 AM
Junaili Lie
 
Posts: n/a
Default Re: What's a good default encoding?

I am wondering if somebody here can tell me the difference between UTF-8 and
SQL-ASCII, whether there are any benefits of converting SQL-ASCII to UTF-8?
If so, under what circumstances do we want to convert to UTF-8?
Thanks,


On 3/15/06, Michael Schmidt <michaelmschmidt@msn.com> wrote:
>
> Perhaps others can comment on encoding versus type of data. I would add
> that the manner in which data are accessed may also be a consideration.
> Specifically, UTF-8 is a good choice if one is going to use JDBC.
>
> Michael Schmidt
>


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-09-2008, 09:47 AM
Tom Lane
 
Posts: n/a
Default Re: What's a good default encoding?

"Junaili Lie" <junaili@gmail.com> writes:
> I am wondering if somebody here can tell me the difference between
> UTF-8 and SQL-ASCII, whether there are any benefits of converting
> SQL-ASCII to UTF-8?


SQL_ASCII isn't really an encoding; it's more like a declaration of
ignorance. If the encoding is set to SQL_ASCII, the backend will store
any high-bit-on data you send it, and return it without any sort of
conversion.

If you are dealing with data beyond the 7-bit ASCII set, it's probably a
really bad idea to be using the SQL_ASCII setting, because the database
won't give you any help at all in checking for bad data or converting
between the encodings wanted by different client programs. There are a
few situations where this is what you want, but I think most people are
better off picking a specific encoding.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-09-2008, 09:47 AM
Martijn van Oosterhout
 
Posts: n/a
Default Re: What's a good default encoding?

On Thu, Mar 16, 2006 at 07:30:36AM +0100, Harald Armin Massa wrote:
> Good default encoding:
>
> does somebody NOT agree that UTF8 is quite a recommendation, at least for
> all the people without Korean, Japanese and Chinese Chars? I know, that'sat
> maximum 2/3 of our potential user base, but better then nothing.


Umm, you should choose an encoding supported by your platform and the
locales you use. For example, UTF-8 is a bad choice on *BSD because
there is no collation support for UTF-8 on those platforms. On
Linux/Glibc UTF-8 is well supported but you need to make sure the
locale you initdb with is a UTF-8 locale. By and large postgres
correctly autodetects the encoding from the locale.

> Maybe we could even "suggest" UTF8 in the "getting started" (i.e. the
> windows installer initdb screen, or other default installations) Sth. like
> "if you do not know better, take utf8"


UTF-8 on windows works pretty well.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFEGSOrIB7bNG8LQkwRAgMgAJ9zpGRAR6hKIjgUd0uOs8 cdGr2ZtQCeMNs9
j5zecouX68x8jo08HF92T/A=
=uDL1
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-09-2008, 09:47 AM
CSN
 
Posts: n/a
Default Re: What's a good default encoding?

I tried changing my database to UTF8 and then
importing the dump (even tried iconv). It choked (on
an accented e). Then somehow the database got created
as LATIN9, and I was able to import successfully. I
guess if it works, I'll be leaving it alone for the
time being.

I still have problems when emdashes are stored in the
database as HTML entities, but they're displayed as
emdashes in a web form, but then get stored back in
the database wrong when edited (an accented A IIRC). I
dunno - maybe it's a browser or Rails thing.

CSN

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-09-2008, 09:47 AM
Martijn van Oosterhout
 
Posts: n/a
Default Re: What's a good default encoding?

On Thu, Mar 16, 2006 at 03:11:27AM -0800, CSN wrote:
> I tried changing my database to UTF8 and then
> importing the dump (even tried iconv). It choked (on
> an accented e). Then somehow the database got created
> as LATIN9, and I was able to import successfully. I
> guess if it works, I'll be leaving it alone for the
> time being.


Note, when you create a dump, pg_dump adds a "set client_encoding" at
the top of the dump. If you change the encoding using iconv without
changing that line, you'll get problems in the import. In theory
dumping from a latin9 database into a utf8 one should Just Work(tm)
because PostgreSQL will convert the data while loading.

> I still have problems when emdashes are stored in the
> database as HTML entities, but they're displayed as
> emdashes in a web form, but then get stored back in
> the database wrong when edited (an accented A IIRC). I
> dunno - maybe it's a browser or Rails thing.


I think the stuff submitted by the browser has a given encoding and
won't be encoded with HTML entities. Converting unicode to HTML
entities has to be done somewhere there...

Hope this helps,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFEGUrvIB7bNG8LQkwRAkIgAJwIC0+PHOyPnOm6yGp1aV mkzB5cOwCePgcf
DHD8SdWKTOUyafX3HbKsR9w=
=4Ohb
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-09-2008, 09:51 AM
Vivek Khera
 
Posts: n/a
Default Re: What's a good default encoding?


On Mar 16, 2006, at 3:36 AM, Martijn van Oosterhout wrote:

> Umm, you should choose an encoding supported by your platform and the
> locales you use. For example, UTF-8 is a bad choice on *BSD because
> there is no collation support for UTF-8 on those platforms. On
> Linux/Glibc UTF-8 is well supported but you need to make sure the


Shouldn't postgres be providing the collating routines for UTF8
anyhow? How else can we guarantee identical behavior across platforms?


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-09-2008, 09:51 AM
Vivek Khera
 
Posts: n/a
Default Re: What's a good default encoding?


On Mar 20, 2006, at 6:04 PM, Peter Eisentraut wrote:

> Vivek Khera wrote:
>> Shouldn't postgres be providing the collating routines for UTF8
>> anyhow?

>
> Start typing ...


So, if I use a UTF8 encoded DB on FreeBSD, all hell will break loose
or what? Will things not compare correctly? Where from does the
code to do the collating come, then?


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 10:06 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com