This is a discussion on \x96 in column value? within the MySQL General forum forums, part of the MySQL category; --> Why is the character \x96 not allowed as a value in an INSERT query? I've SET NAMES utf8, the ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Why is the character \x96 not allowed as a value in an INSERT query? I've SET NAMES utf8, the table collation is utf8_general_ci, the default character set for the table is utf8. I get this error message from my application: ERROR: INSERT INTO prod (prod_title,prod_published,prod_pub_prod_id,prod_c ontent_info,prod_samp_doc,prod_toc_doc,prod_type,p rod_ready,prod_discont,prod_info_type,prod_id,prod _num,pub_id,prod_added,prod_updated,prod_export,pr od_changed) VALUES ('Disposable Paper Products ? Uruguay','2007-12-01','DPP2007UY','Pages: 44','\\\\xa20\\d\\publishers\\eo\\fullreports\\DPP 07_UY.pdf','\\\\xa20\\d\\publishers\\eo\\content\\ DPP07_UY-LOC.doc','S',1,0,'E','Y8F1kqasoQVE90F',65642,'BG3g 33kv5CDkim7',NOW(),NOW(),1,1); Incorrect string value: '\x96 Urug...' for column 'prod_title' at row 1 That bizarre character in the middle of 'Disposable Paper Products ? Uruguay' is, as you can see, \x96. MySQL certainly doesn't seem to like it. The data is being read from a file. Curiously, if I copy the line directly from the input file it looks like this: Disposable Paper Products - Uruguay You may not be able to see it, but that is actually an n-dash (\x96). Regards, Jerry Schwartz The Infoshop by Global Information Incorporated 195 Farmington Ave. Farmington, CT 06032 860.674.8796 / FAX: 860.674.8341 www.the-infoshop.com www.giiexpress.com www.etudes-marche.com |
| |||
| I'm running afoul of the UTF8 character set somehow: mysql> select convert(char(0x96) using utf8); +----------------------------------+ | convert(char(0x96) using utf8) | +----------------------------------+ | NULL | +----------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> show warnings; +-------+------+-------------------------------------+ | Level | Code | Message | +-------+------+-------------------------------------+ | Error | 1300 | Invalid utf8 character string: '96' | +-------+------+-------------------------------------+ 1 row in set (0.00 sec) On top of my other problems, I've discovered that pasting the UTF8 character represented by 0x96 into the MySQL CLI (Windows) somehow converts the character to 0x2D (a normal dash); so a lot of my testing has been wasted. Pasting it into a Windows-based editor preserves the character as 0x96. Regards, Jerry Schwartz The Infoshop by Global Information Incorporated 195 Farmington Ave. Farmington, CT 06032 860.674.8796 / FAX: 860.674.8341 www.the-infoshop.com www.giiexpress.com www.etudes-marche.com >-----Original Message----- >From: Jerry Schwartz [mailto:jschwartz@the-infoshop.com] >Sent: Wednesday, April 16, 2008 12:49 PM >To: 'Mysql' >Subject: \x96 in column value? > >Why is the character \x96 not allowed as a value in an INSERT query? >I've SET >NAMES utf8, the table collation is utf8_general_ci, the default >character set >for the table is utf8. I get this error message from my application: > >ERROR: INSERT INTO prod >(prod_title,prod_published,prod_pub_prod_id,prod_ content_info,prod_samp_ >doc,prod_toc_doc,prod_type,prod_ready,prod_discon t,prod_info_type,prod_i >d,prod_num,pub_id,prod_added,prod_updated,prod_ex port,prod_changed) >VALUES ('Disposable Paper Products ? Uruguay','2007-12- >01','DPP2007UY','Pages: >44','\\\\xa20\\d\\publishers\\eo\\fullreports\\DP P07_UY.pdf','\\\\xa20\\ >d\\publishers\\eo\\content\\DPP07_UY- >LOC.doc','S',1,0,'E','Y8F1kqasoQVE90F',65642,'BG3 g33kv5CDkim7',NOW(),NOW >(),1,1); >Incorrect string value: '\x96 Urug...' for column 'prod_title' at row 1 > >That bizarre character in the middle of 'Disposable Paper Products ? >Uruguay' >is, as you can see, \x96. MySQL certainly doesn't seem to like it. The >data is >being read from a file. Curiously, if I copy the line directly from the >input >file it looks like this: > >Disposable Paper Products - Uruguay > >You may not be able to see it, but that is actually an n-dash (\x96). > >Regards, > >Jerry Schwartz >The Infoshop by Global Information Incorporated >195 Farmington Ave. >Farmington, CT 06032 > >860.674.8796 / FAX: 860.674.8341 > >www.the-infoshop.com >www.giiexpress.com >www.etudes-marche.com > > > > > >-- >MySQL General Mailing List >For list archives: http://lists.mysql.com/mysql >To unsubscribe: http://lists.mysql.com/mysql?unsub=jschwartz@the- >infoshop.com |
| |||
| On Wed, 16 Apr 2008, Jerry Schwartz <jschwartz@the-infoshop.com> wrote: > I'm running afoul of the UTF8 character set somehow: > > mysql> select convert(char(0x96) using utf8); > +----------------------------------+ > | convert(char(0x96) using utf8) | > +----------------------------------+ > | NULL | > +----------------------------------+ > 1 row in set, 1 warning (0.00 sec) > > mysql> show warnings; > +-------+------+-------------------------------------+ > | Level | Code | Message | > +-------+------+-------------------------------------+ > | Error | 1300 | Invalid utf8 character string: '96' | > +-------+------+-------------------------------------+ > 1 row in set (0.00 sec) > > On top of my other problems, I've discovered that pasting the UTF8 > character represented by 0x96 into the MySQL CLI (Windows) somehow > converts the character to 0x2D (a normal dash); so a lot of my > testing has been wasted. Pasting it into a Windows-based editor > preserves the character as 0x96. In an earlier note, he wrote > You may not be able to see it, but that is actually an n-dash > (\x96). Actually, \x96 is not an en-dash. <http://www.unicode.org/charts/PDF/U0080.pdf> says that it's "START OF GUARDED AREA". x96 is in the middle of a block of control characters from the unnamed control character at \x80 through APPLICATION PROGRAM COMMAND at \x9F (or arguably NO-BREAK SPACE at \xa0). Microsoft, in some of their Windows code pages, assigned meanings to those values that differ from the Unicode and ISO-8859-1 standards (quelle suprise), assigning many of them uses as printable characters. I think it's the Windows 1250 code page, at <http://www.microsoft.com/globaldev/reference/sbcs/1250.mspx>. As that page and <http://www.microsoft.com/typography/developers/fdsspec/punc2.htm> note, the Unicode standard value for an en-dash is U+2013 (which appears to be in hex). As to whether this affects the problem I don't know. Since x96 is a valid character, whether Microsoft or real Unicode, I would not expect it to be a problem per se. I just wanted to point out what it might not mean. -- Tim McDaniel, tmcd@panix.com |
| ||||
| >-----Original Message----- >From: Tim McDaniel [mailto:tmcd@panix.com] >Sent: Wednesday, April 16, 2008 2:32 PM >Cc: 'Mysql' >Subject: RE: \x96 in column value? > >On Wed, 16 Apr 2008, Jerry Schwartz <jschwartz@the-infoshop.com> wrote: >> I'm running afoul of the UTF8 character set somehow: >> >> mysql> select convert(char(0x96) using utf8); >> +----------------------------------+ >> | convert(char(0x96) using utf8) | >> +----------------------------------+ >> | NULL | >> +----------------------------------+ >> 1 row in set, 1 warning (0.00 sec) >> >> mysql> show warnings; >> +-------+------+-------------------------------------+ >> | Level | Code | Message | >> +-------+------+-------------------------------------+ >> | Error | 1300 | Invalid utf8 character string: '96' | >> +-------+------+-------------------------------------+ >> 1 row in set (0.00 sec) >> >> On top of my other problems, I've discovered that pasting the UTF8 >> character represented by 0x96 into the MySQL CLI (Windows) somehow >> converts the character to 0x2D (a normal dash); so a lot of my >> testing has been wasted. Pasting it into a Windows-based editor >> preserves the character as 0x96. > >In an earlier note, he wrote >> You may not be able to see it, but that is actually an n-dash >> (\x96). > >Actually, \x96 is not an en-dash. ><http://www.unicode.org/charts/PDF/U0080.pdf> says that it's >"START OF GUARDED AREA". x96 is in the middle of a block of control >characters from the unnamed control character at \x80 through >APPLICATION PROGRAM COMMAND at \x9F (or arguably NO-BREAK SPACE at >\xa0). [JS] Right you are. This whole business gives me an extreme headache. When working in PHP, I assume my Windows-generated input is cp1252 and I convert that to UTF-8. Aside from that, we always work in UTF-8 (database and web) because I have to handle Chinese. (I have no idea if I'm doing that right, I can't read the results. ;<) In Microsoft's code page 1252, 0x96 is indeed an n-dash. I think this might be my clue. Although our web pages specify UTF-8, I found an article in MSDN that seems to say that IE interprets UTF-8 pages using a code page in the cp1200 "family", whatever that means. That must be why our data looks correct going end-to-end. I also found http://effbot.org/zone/unicode-gremlins.htm, which gives a bit of Python code to translate some cp1252 bits to their Unicode equivalents. It also give you a nice list of the problem characters. There are also examples in the PHP documentation of the iconv() function, but there is also a comment that 0x96 breaks iconv. I need to chew on this some more. PHP doesn't really handle multi-byte characters until 6.x. > >Microsoft, in some of their Windows code pages, assigned meanings to >those values that differ from the Unicode and ISO-8859-1 standards >(quelle suprise), assigning many of them uses as printable characters. >I think it's the Windows 1250 code page, at ><http://www.microsoft.com/globaldev/reference/sbcs/1250.mspx>. >As that page and ><http://www.microsoft.com/typography/developers/fdsspec/punc2.htm> >note, the Unicode standard value for an en-dash is U+2013 (which >appears to be in hex). > >As to whether this affects the problem I don't know. Since x96 is a >valid character, whether Microsoft or real Unicode, I would not expect >it to be a problem per se. I just wanted to point out what it might >not mean. > >-- >Tim McDaniel, tmcd@panix.com > >-- >MySQL General Mailing List >For list archives: http://lists.mysql.com/mysql >To unsubscribe: http://lists.mysql.com/mysql?unsub=jschwartz@the- >infoshop.com |