vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello, I would like to remove diacritical marks from a string in a SQL query. I tried to convert a UTF8 string to ASCII but it doesn't work for me. SELECT convert('','UTF8','SQL_ASCII') array(1) { ["convert"]=> string(18) "" } Thanks for any advice, J.N. ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org/ |
| |||
| On Fri, Jan 12, 2007 at 10:16:22PM +0100, Ji? N?mec wrote: > Hello, > > I would like to remove diacritical marks from a string in a SQL query. > I tried to convert a UTF8 string to ASCII but it doesn't work for me. > > SELECT convert('?????','UTF8','SQL_ASCII') I don't think postgres has any stuff builtin for that, but other languages (like perl) have modules to do this kind of thing. The method is to decompose the string to normal form D, strip the diacritics, and recompose what's left. Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFFqASMIB7bNG8LQkwRAhS5AKCLES3OBZXlsJpohe4uB4 GOTO9WjQCeOZhI aa9Am5Klzi0jcYlwweXRDjg= =xzoa -----END PGP SIGNATURE----- |
| ||||
| On Fri, Jan 12, 2007 at 10:58:36PM +0100, Martijn van Oosterhout wrote: > On Fri, Jan 12, 2007 at 10:16:22PM +0100, Jiř* Němec wrote: > > I would like to remove diacritical marks from a string in a SQL query. > > I tried to convert a UTF8 string to ASCII but it doesn't work for me. > > > > SELECT convert('ěščřžýá*é','UTF8','SQL_ASCII') > > I don't think postgres has any stuff builtin for that, but other > languages (like perl) have modules to do this kind of thing. The method > is to decompose the string to normal form D, strip the diacritics, and > recompose what's left. A technique that's been posted before might work: SELECT to_ascii(convert('ěščřžýá*é', 'LATIN2'), 'LATIN2'); to_ascii ----------- escrzyaie (1 row) to_ascii() supports only LATIN1, LATIN2, LATIN9, and WIN1250 so you have to convert to one of those encodings first. As Martijn suggested, you could use Perl. Here's an example with Text::Unaccent (you'll need to use encoding names that iconv recognizes): CREATE FUNCTION unaccent(charset text, string text) RETURNS text AS $$ use Text::Unaccent; return unac_string($_[0], $_[1]); $$ LANGUAGE plperlu IMMUTABLE STRICT; SELECT unaccent('UTF-8', 'ěščřžýá*é'); unaccent ----------- escrzyaie (1 row) Here's an example that uses Unicode::Normalize to strip non-spacing and enclosing marks: CREATE FUNCTION unaccent(string text) RETURNS text AS $$ use Unicode::Normalize; my $nfd_string = NFD($_[0]); $nfd_string =~ s/[\p{Mn}\p{Me}]//g; return NFC($nfd_string); $$ LANGUAGE plperlu IMMUTABLE STRICT; SELECT unaccent('ěščřžýá*é'); unaccent ----------- escrzyaie (1 row) -- Michael Fuhr ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |