Japanese PHP/Mysql development tips?
August 28, 2006 6:27 AM
Subscribe
About to create a PHP/MySQL CMS and web site for a client entirely in Japanese, what do I need to know?
The site is essentially a japanese version of an existing ecommerce site with the shopping removed ( it's a product catalog without any ecom ). Due to circumstances beyond my control, I can't reuse the same code the ecom site uses.
I do not speak japanese, but the client will be providing a doc with english and japanese for everything on the site, and will be entering the the product info themselves using the CMS.
I usually roll my own CMS for sites of this size, but I'm considering some templating systems or maybe one of those systems that enforces MVC that the kids are so crazy about nowadays (Symfony looks interesting). I don't know that that makes a difference with my question, but maybe there's something I'm not considering.
I've been doing some reading, and I'm pretty overwhelmed by all the character set discussion. I hadn't really expected there to be more than half a dozen options. Performance is not really a concern since this site is going to be small and low traffic, the primary concern is ease of development and that it works consistently.
So, long preamble done, my questions:
1. From what I've found so far, it sounds like UTF-8 is the character set I should go for. Is this correct? Should I look into other encodings?
2. MYSQL. According to the docs, if I'm using MySQL 4.1 or greater, I can simply set a field to UTF-8 encoding like so: ALTER TABLE myTable MODIFY myColumn VARCHAR(255) CHARACTER SET utf8;. Anything else I need to do on the mysql end?
3. PHP & HTML. I'm less clear how to get the data from a form field into UTF-8 and send it off to MySQL. On a whim, I did a test, and noticed by default IE and Firefox already do a different encoding (the data from FF ending up in the database looking like this -- & #12506; & #12540; (w/o spaces), and IE's looked like this -- ラ). Presumably I need to set the headers? Is there something I need to put in the FORM tag (does it need to be multipart?). When dealing with the submitted data can I safely just grab the $_REQUEST value and send it off to the database, or is there some transformation I need to do? Similarly, is there anything I need to do with data I have retrieved from the database before displaying it?
Thanks in advance for any advice.
posted by malphigian to computers & internet (7 comments total)
1 user marked this as a favorite
2) While it's best to set the fields to UTF-8, it's not particularly important because UTF-8 can be temporarily stored in any ASCII-compatible character set (e.g. the MySQL default) without data loss. As long as you treat it like UTF-8 on the presentation end, how you store it just needs to be ASCII-compatible.
3) Setting your HTML charset in a HEAD META tag to UTF-8 will ensure it's sent from browser to server as UTF-8. You don't need anything special in form elements. After it gets to the server, PHP treats everything as ASCII, which, as I said above, is a safe way to handle (though not display or transform) UTF-8.
4) If you're doing any manipulation on Japanese text in UTF-8, you may find useful an article I wrote on converting UTF-8 to arrays of unicode code points and back.
posted by scottreynen at 6:46 AM on August 28, 2006