I have Unicode text which I need to import into a Moveable Type blog, which requires ASCII imports. What can I do?
ASCII/Unicode conversions/importing into MT problems...[mi]

So I'm trying to import a blog with a lot of non-ASCII characters (Chinese saved in Unicode in my case) into Moveable Type 3. I was running into some problems, checked the MT support fora, and found that you have to import in ASCII. But the problem for me with just saving to ASCII is that it would turn all that Chinese into garbled nonsense.

It was suggested to me when I asked that the main problem with trying to import non-ASCII text was with the type of line breaks. MT needs to have Unix-style line breaks (LF) while Unicode plain text stores with Windows-style line breaks (CR/LF).

Would there be some way to take my Unicode formatted export file and convert it to ASCII while preserving the Unicode? For instance, to convert the Unicode to its hexadecimal coding and the line breaks from CR/LF to LF. Would this do the trick? I saw online some Java converters, but that would hardly work with my 2MB export file.

Does anyone else have any experience importing with non-ASCII characters into MT?
.. and on a similar tack, anyone know of a Win utility that will allow me to paste in foreign fonts from the clipboard I(selected from a pdf) and convert it into the necessary escape sequence?
posted by Pericles at 4:08 AM on June 30, 2004

I took a look at the file. A few comments:

1. I think it's just a matter of the LF/CR conflict--you don't need to be in ASCII per se.

2. BBEdit can convert between the different line-ending formats.

3. Almost all of the Chinese text in there is actually encoded as escaped numeric Unicode entities, so (apart from one or two spots I noticed) you actually *could* upload this as ASCII. The problem with these numeric entitites is that, although they display fine on-screen, their useless for editing. You probably failed to set the "NoHTMLEntities" flag to 1 in your mt.cfg file in your old blog.
posted by adamrice at 8:02 AM on June 30, 2004

