UTF-8 kanji in Wordpress 2.1
April 14, 2007 5:08 PM   RSS feed for this thread Subscribe

How do you make Wordpress 2.1 handle UTF-8 encoded kanji and hiragana correctly?

A friend of mine recently upgraded his blog to Wordpress 2.1, and now anything he posts in UTF-8 encoded Japanese is fratzed. An older post which was created using an older version of Wordpress used to display kanji properly but doesn't any more, and all attempts to place kanji either in new blog posts or in new comments results instead in three garble characters per kanji. Does anyone know what can be done to fix this? You can see an example here.

(Using ampersand-pound-number-semicolon encoding does work, but that's painful to use and not really an acceptable solution. We'd really much rather know how to make UTF-8 encoding work properly.)
posted by Steven C. Den Beste to computers & internet (10 comments total)
I've never had a problem using kanji in any version of wordpress (1.5-2.2), and seeing how it worked prior to your friend upgrading it may be something weird.

But here's the standard answer:

In your admin section go to Options > Reading and at the very bottom is an option to select your character encoding, enter utf-8 and you should be good. If it's already utf-8 then I don't know.
posted by any portmanteau in a storm at 5:26 PM on April 14, 2007


He says, "it is already utf-8, that was the first thing i tried :("
posted by Steven C. Den Beste at 5:39 PM on April 14, 2007


I'm guessing that the database encoding on the columns is latin1 and that mysql+word press is double encoding the characters.

Here's a guy who had a similar experience when movie hosts. If that is the problem, then good luck. Hopefully your friend doesn't have too many posts.
posted by sbutler at 5:47 PM on April 14, 2007


(good Lord... what happened to my English there? Anyway, you know what I meant)
posted by sbutler at 5:47 PM on April 14, 2007


I never had trouble using Japanese characters in WordPress. One thing you should do, though, is in your main theme file (the one with <htm></html> in it) you should have:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

This sends the Content-Type HTTP header saying that the content on the page is encoded in UTF-8. (I think there's a place in WP config to set this, but I don't remember ... I've since switched to Typo).
posted by DrSkrud at 5:52 PM on April 14, 2007


He already has a correct "Content-type" declaration in his header.
posted by Steven C. Den Beste at 6:01 PM on April 14, 2007


OK, here's another suggestion that probably won't help:
Disable all plugins and see if the problem goes away

And if he backed up his database prior to upgrading then here are three more:
Do a fresh install of 2.1.3 and restore the database
Upgrade the wordpress to 2.2
If the previous two fail then install the previous version of wordpress and restore the database
posted by any portmanteau in a storm at 6:14 PM on April 14, 2007


If you look at the source, you'll see those characters are being published as HTML entities, not UTF-8, so the character encoding settings are irrelevant. Whatever is converting the raw text to HTML entities is the problem, as it's wrongly interpretting multi-byte UTF-8 as individual single-byte characters.
posted by scottreynen at 6:59 PM on April 14, 2007


You're right, Steven ... I didn't even look ...*shame*
posted by DrSkrud at 7:37 PM on April 14, 2007


S'okay. I don't flog people for being wrong. (And I appreciate the willingness to help.)
posted by Steven C. Den Beste at 7:51 PM on April 14, 2007


« Older which HD TFT/LCD monitor to ge...   |   What's this milky white liquid... Newer »
This thread is closed to new comments.