Unix Technical Forum

Re: FW: Win32 unicode vs ICU

This is a discussion on Re: FW: Win32 unicode vs ICU within the Pgsql Patches forums, part of the PostgreSQL category; --> Added to TODO.detail. --------------------------------------------------------------------------- Magnus Hagander wrote: > I just realised this mail didn't go through. Probably because it ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Patches

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 01:43 AM
Bruce Momjian
 
Posts: n/a
Default Re: FW: Win32 unicode vs ICU


Added to TODO.detail.

---------------------------------------------------------------------------

Magnus Hagander wrote:
> I just realised this mail didn't go through. Probably because it was too
> large for -hackers. So: repost to -patches. Sorry about that. If it's a
> duplicate, even more sorry, but I couldn't find it in the archives.
>
> (This may explain that nobody answered me :P)
>
> //Magnus
>
>
> > -----Original Message-----
> > From: Magnus Hagander
> > Sent: Sunday, July 31, 2005 2:09 PM
> > To: PostgreSQL-development
> > Cc: pgsql-hackers-win32@postgresql.org
> > Subject: Win32 unicode vs ICU
> >
> > Hi!
> >
> > I've been working with Palles ICU patch to make it work on
> > win32, and I believe I have it done. While doing it I noticed
> > that ICU basically converts to UTF16 and back - I previously
> > thought it worked on UTF8 strings. Based on this I also tried
> > out an implementation for the win32-unicode problem that does
> > *not* require ICU. It uses the win32 native functions to map
> > to utf16 and back, and then to process the text there. And I
> > got through with much less code than the ICU version, while
> > doing the same thing.
> >
> > I am unsure of how to proceed. As I see it there are three paths:
> > 1) Use native win32 functionality only on win32
> > 2) Use ICU functionality only on win32
> > 3) Allow both ICU and native functionality, compile time
> > switch --with-icu (same as unix with the ICU patch)
> >
> >
> > The main downsides of ICU vs the native ones are:
> > * ICU does not accept win32 locale names. When doing
> > setlocale("sv_se"), for example, win32 will return this in
> > later calls as "Swedish_Sweden.1252". To get around this in
> > the ICU patch, I had to implement a lookup map that converts
> > it back to sv_se for ICU.
> >
> > * ICU is yet another build and runtime dependency, and a
> > large one (comes in at 11Mb for the DLL files alone in the
> > win32 download)
> >
> >
> > I guess that the main upside of it is that we'd get
> > constistent behaviour - in case there are issues with either
> > ICU or win32 native they'd otherwise differ. And only one new
> > codepath. But we already live with the platform-inconsistency today...
> >
> > Another upside is that it handles more encodings in ICU - my
> > native implementation does *only* UTF8 and relies on existing
> > functionality to deal with other encodings. It could of
> > course be extended if necessary, but from what I can tell
> > UTF8 is the big one.
> >
> >
> >
> > I have attached both patches. For the native version, only
> > win32_utf8.patch is required. For the ICU version,
> > icu_win32.patch is needed and also the files
> > localemap.c,localemap.pl, iso639 and iso3166 needs to go in
> > src/backend/port/win32. (the localemap needs to be updated to
> > do a better-than-linear search, but I wanted to include an example)
> >
> >
> > Thoughts on the options?
> >
> >
> > And anohter question - my native patch touches the same
> > functions as the ICU patch. Can somebody who knows the
> > internals confirm or deny that these are all the required
> > locations, or do we need to modify more?
> >
> > (I have run simple tests in swedish locale and both behave
> > the same and correct, but I'm unsure of exactly how much
> > would be affected)
> >
> > Finally, the win32 patch also changes the normal path to use
> > strncoll(). The comment above the function states that we'd
> > like to use strncoll but it's not available. Well, on win32
> > it is, so it should provide a speedup on win32. It is
> > currently not included in the ICU patch, but should probably
> > be included whichever path we'd chose.
> >
> >
> > //Magnus
> >


Content-Description: win32_utf8.patch

[ Attachment, skipping... ]

Content-Description: icu_win32.patch

[ Attachment, skipping... ]

Content-Description: localemap.pl

[ Attachment, skipping... ]

Content-Description: localemap.c

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org


--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:06 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com