vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I'm trying to store the symbol (R) (that's the registered trademark symbol) in my database, but I get a weird Ctrl-A (^A) character whenever I try. At first, I thought it was because I was calling htmlentities without passing in "UTF-8" as the last argument, but that only solved one of my problems. Then I spent some time looking at encodings, and I'm trying to figure out if the fact that the charset is set to latin1 is the reason why. Assuming it is, is there anything I can do to avoid having to dump the database and recreate it with the other encoding? I've spent some time tonight looking on the web and at MySQL's documentation on charsets and these are the options I've come up with. alter table TABLE_NAME convert to character set utf8; I'm assuming this is just a regular alter table, which means I'm going to have to take down my server for the duration of the change, which can take long periods of time. http://www.oreillynet.com/onlamp/blo..._latin1_t.html This essentially is dump and recreate the db. iconv This was mentioned somewhere, but no one had a concrete implementation. Also, are there any gotchas in doing this? I assume I should check if my mysql has support for UTF-8, that I need to issue SET NAMES 'utf8'; or put it into my.cnf, and that my php code needs to output the headers with UTF-8 as well. Thanks, Waynn |
| |||
| Waynn Lue wrote: > I'm trying to figure out if the fact that the charset > is set to latin1 is the reason why. It shouldn't be. The registered trademark symbol is code point 0xAE in ISO 8859-1 according to the 'pedia: http://en.wikipedia.org/wiki/ISO_8859-1 So, it would seem that your data source isn't giving you 8859-1. GIGO. > iconv > This was mentioned somewhere, but no one had a concrete implementation. There's a command line tool by that name that converts text between character sets, but I don't see how that applies here. You could use it to convert a dump file, but you're already on record as not wanting to do that, so... > I assume I should check if my mysql has support for UTF-8, I believe it just has to be 4.1 or newer. And, that's only necessary so you can get UTF-8 aware sorting and such. You don't need any special support to just _store_ UTF-8 data. |
| |||
| Are you using mysqli, PDO, or mysql? There are differences in the way the character set is determined, as I discovered after much the same experience as yours. Regards, Jerry Schwartz The Infoshop by Global Information Incorporated 195 Farmington Ave. Farmington, CT 06032 860.674.8796 / FAX: 860.674.8341 www.the-infoshop.com www.giiexpress.com www.etudes-marche.com >-----Original Message----- >From: Waynn Lue [mailto:waynnlue@gmail.com] >Sent: Saturday, May 10, 2008 7:37 AM >To: mysql@lists.mysql.com >Subject: latin1 vs UTF-8 > >I'm trying to store the symbol (R) (that's the registered trademark >symbol) in my database, but I get a weird Ctrl-A (^A) character >whenever I try. At first, I thought it was because I was calling >htmlentities without passing in "UTF-8" as the last argument, but that >only solved one of my problems. Then I spent some time looking at >encodings, and I'm trying to figure out if the fact that the charset >is set to latin1 is the reason why. > >Assuming it is, is there anything I can do to avoid having to dump the >database and recreate it with the other encoding? I've spent some >time tonight looking on the web and at MySQL's documentation on >charsets and these are the options I've come up with. >alter table TABLE_NAME convert to character set utf8; >I'm assuming this is just a regular alter table, which means I'm going >to have to take down my server for the duration of the change, which >can take long periods of time. > >http://www.oreillynet.com/onlamp/blo...l_data_in_lati >n1_t.html >This essentially is dump and recreate the db. > >iconv >This was mentioned somewhere, but no one had a concrete implementation. > >Also, are there any gotchas in doing this? I assume I should check if >my mysql has support for UTF-8, that I need to issue SET NAMES 'utf8'; >or put it into my.cnf, and that my php code needs to output the >headers with UTF-8 as well. > >Thanks, >Waynn > >-- >MySQL General Mailing List >For list archives: http://lists.mysql.com/mysql >To unsubscribe: http://lists.mysql.com/mysql?unsub=jschwartz@the- >infoshop.com |
| |||
| > > I assume I should check if my mysql has support for UTF-8, > > > > I believe it just has to be 4.1 or newer. And, that's only necessary so > you can get UTF-8 aware sorting and such. You don't need any special > support to just _store_ UTF-8 data. Ah, that's actually the critical part. I'm actually generating the data myself through PHP, but I'm getting a weird ^A character when I try to print it out in a textarea field. I'm trying to figure out if there's some weird interaction between htmlentities that's causing it to be displayed strangely. Can I trust that mysql is displaying the text correctly on the command line tool if I have 4.1, even if the charset is set to latin1? Are there any caveats to using htmlentities that I'm missing? Essentially I'm creating a form with a text area that allows people to enter in values, then they can reload the form with the text area pre-filled in for the id they stored it for. Waynn |
| |||
| Waynn Lue wrote: > > I'm getting a weird ^A character when I > try to print it out in a textarea field. In that case, what character set does the browser think it should be using for the page? If you don't explicitly declare it, the browser has to guess, and you know what happens when you rely on a stupid computer to do thinking a human should have done instead. You can either declare it for all pages on a site in your web server configuration (gets sent with HTTP headers), in the equivalent <meta> tag, or in an <?xml> tag if you're using XHTML. > I'm trying to figure out if > there's some weird interaction between htmlentities that's causing it > to be displayed strangely. To debug problems like this, I recommend studying hex dumps of the relevant data at every stage along the path: 1. echo 'query' | mysql --user=fred --password=barney mydb | hexdump 2. write hex dump of query results to PHP debug log 3. packet capture of HTTP reply containing finished page 4. in browser, save web page to disk, and run through hexdump tool You'll find that a) the data isn't stored correctly in the database; or b) it's being translated to another character set along the way (it happens!); or c) the browser is misinterpreting it because you didn't tell it what it needs to know to display it correctly. > Can I trust that mysql is displaying the > text correctly on the command line tool if I have 4.1, even if the > charset is set to latin1? Unless you're on a very old or strangely configured system, your terminal is probably configured for UTF-8. Since your DB is in Latin-1, there's a character set translation in there, and I can't confidently predict what will happen. In this modern world, it's best to use some form of Unicode everywhere possible. Then the worst you have to deal with is conversion among encodings, which is annoying but lossless. |
| ||||
| I used not only the PHP functions but also the native MySQL HEX() function to see exactly what was going into my database. You can also set always_populate_raw_post_data = On in php.ini and examine $HTTP_RAW_POST_DATA to see exactly what the web server is seeing. Regards, Jerry Schwartz The Infoshop by Global Information Incorporated 195 Farmington Ave. Farmington, CT 06032 860.674.8796 / FAX: 860.674.8341 www.the-infoshop.com www.giiexpress.com www.etudes-marche.com >-----Original Message----- >From: Warren Young [mailto:warren@etr-usa.com] >Sent: Tuesday, May 13, 2008 8:53 AM >To: MySQL List >Subject: Re: latin1 vs UTF-8 > >Waynn Lue wrote: >> >> I'm getting a weird ^A character when I >> try to print it out in a textarea field. > >In that case, what character set does the browser think it should be >using for the page? If you don't explicitly declare it, the browser has >to guess, and you know what happens when you rely on a stupid computer >to do thinking a human should have done instead. > >You can either declare it for all pages on a site in your web server >configuration (gets sent with HTTP headers), in the equivalent <meta> >tag, or in an <?xml> tag if you're using XHTML. > >> I'm trying to figure out if >> there's some weird interaction between htmlentities that's causing it >> to be displayed strangely. > >To debug problems like this, I recommend studying hex dumps of the >relevant data at every stage along the path: > >1. echo 'query' | mysql --user=fred --password=barney mydb | hexdump > >2. write hex dump of query results to PHP debug log > >3. packet capture of HTTP reply containing finished page > >4. in browser, save web page to disk, and run through hexdump tool > >You'll find that a) the data isn't stored correctly in the database; or >b) it's being translated to another character set along the way (it >happens!); or c) the browser is misinterpreting it because you didn't >tell it what it needs to know to display it correctly. > > > Can I trust that mysql is displaying the >> text correctly on the command line tool if I have 4.1, even if the >> charset is set to latin1? > >Unless you're on a very old or strangely configured system, your >terminal is probably configured for UTF-8. Since your DB is in Latin-1, >there's a character set translation in there, and I can't confidently >predict what will happen. > >In this modern world, it's best to use some form of Unicode everywhere >possible. Then the worst you have to deal with is conversion among >encodings, which is annoying but lossless. > >-- >MySQL General Mailing List >For list archives: http://lists.mysql.com/mysql >To unsubscribe: http://lists.mysql.com/mysql?unsub=jschwartz@the- >infoshop.com |