I have an install of SMF that has its database set to UTF-8, and works for *most* posts. I recently migrated the tables out of one database, and into another, to separate it out from a Joomla install (They had both been in the same DB due to the limitations of the bridging component, but a newer bridge (JFusion) allows me to separate them, so I did for safety and security).
It appears that during the copy from one DB to the other, some of the characters in the posts became corrupt, or simply aren't showing up properly. After working with some developers, we've come to the conclusion that a handful of characters in the body of messages are actually NOT in UTF-8 format, but most likely in ISO-8859-1. You can see a sample post here:
http://www.knightrealms.com/forums/index.php?topic=50.0
Originally, the body of the post wasn't showing at all, but we were able to get it to show with an error (the [?] ) instead. I can confirm other posts, and other fields (such as signatures) are having the same issue, and when I look in the DB at them, they have characters such as curved quotes, Spanish characters, etc.
Is there a way to "clean" my DB to convert the wayward characters into UTF-8? I did a search on
Metafilter and found this, but the links are all long gone. I did some
googling and found this, but my PHP and mySQL-fu is not good enough to actually implement it - I can set up SMF and Joomla, but I'm not a developer / coder yet.
I tired running SMF's "Convert to UTF-8" function, and while this fixes the existing issues, it then introduces additional issues in the posts that were once fine. The issue appears to be record-specific.
Any help that can point me to a step-by-step process to clean / convert the problem characters would be appreciated!
What I've done in the past, is to use the power of the free MySQl Administrator software while importing/exporting the database. It has an option to choose the alphabet things have to be exported in. And a bit of fiddling with that always gave me a new database without those coding errors.
posted by ijsbrand at 10:24 AM on October 9, 2009