vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| The attached patch is intended to ensure that chr() does not produce invalidly encoded data, as recently discussed on -hackers. For UTF8, we treat its argument as a Unicode code point; for all other multi-byte encodings, we raise an error on any argument greater than 127. For all encodings we raise an error if the argument is 0 (we don't allow null bytes in text data). The ascii() function is adjusted so that it remains the inverse of chr() - i.e. for UTF8 it returns the Unicode code point, and it raises an error for any other multi-byte encoding if the aregument is outside the ASCII range. I have tested thius inverse property across the entire Unicode code point range, 0x01 .. 0x1ffff. TODO: alter docs to reflect changes. cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Andrew Dunstan wrote: > > The attached patch is intended to ensure that chr() does not produce > invalidly encoded data, as recently discussed on -hackers. For UTF8, we > treat its argument as a Unicode code point; for all other multi-byte > encodings, we raise an error on any argument greater than 127. For all > encodings we raise an error if the argument is 0 (we don't allow null bytes > in text data). The ascii() function is adjusted so that it remains the > inverse of chr() - i.e. for UTF8 it returns the Unicode code point, and it > raises an error for any other multi-byte encoding if the aregument is > outside the ASCII range. I have tested thius inverse property across the > entire Unicode code point range, 0x01 .. 0x1ffff. Hmm, is this what we had agreed? I'm not sure I like it; if I'm using chr() to produce characters, then the application is going to have to worry about server_encoding in order to find the correct parameter to pass to chr(). What I thought was the idea is that chr() always gets an Unicode code point, and it converts the character to the server_encoding. If the character cannot be converted, then it raises an error. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| ||||
| Alvaro Herrera <alvherre@commandprompt.com> writes: > Hmm, is this what we had agreed? I'm not sure I like it; if I'm using > chr() to produce characters, then the application is going to have to > worry about server_encoding in order to find the correct parameter to > pass to chr(). That's always been the case. > What I thought was the idea is that chr() always gets an Unicode code > point, and it converts the character to the server_encoding. I think that would be too big a break from past practice --- the operative word being "break", because in LATINn character sets chr/ascii work just fine. I wouldn't object to introducing some sort of unicode_chr/unicode_ascii function pair that translates to/from Unicode code points regardless of the DB encoding. But that smells way more like a new feature than plugging a hole, so I suggest it should wait for 8.4. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |