UTF-8 headaches
June 13, 2006 9:53 AM   Subscribe

Why can't IE display the same characters as charmap, for the same char-set/font combination?

I'm doing some work on UTF-8 encoded data, and have noticed that in IE, many of the characters on this page show up as blank boxes. Yet if I bring up the same font (times new roman) in windows' charmap utility, it correctly shows all the characters. Since we will be using an ie-based browser app to edit this data, I'm concerned.

Is the problem with IE, the webpage, webserver, or my pc? And is it fixable?
posted by nomisxid to Computers & Internet (6 answers total) 1 user marked this as a favorite
 
It would probably help if you installed the appropriate language extensions in IE.

tools->Internet Options->General and hit the "Languages" button
posted by Steven C. Den Beste at 10:28 AM on June 13, 2006


The page you've linked to shows a flat listing of any HTML entities from 31 - 1000 for the Times font, however if you notice in the charmap utility for the Times New Roman font has character codes that go much higher than 1000, in fact it goes all the way up to FEFA (hex) but even in charmap Times does not have a character for every code from 31 to 1000.

In other words the Times font (on Windows) has a lots and lots of Unicode charcters, I wouldn't worry about that, you just need to note the character code in charmap and use the right HTML entity code to render that character. For example: † is a dagger.
posted by StarForce5 at 10:29 AM on June 13, 2006


Response by poster: Steven, I'm pretty sure ie-languages is just that, languages, not charset/fonts. It definetly doesn't make any difference.

StarForce, the issue is people have to be able to see the correct character thru their browsers, in order to QA the data.

Looking closer at charmap, I see now the issue is that, as you noted, the webpage is blindly listing all possible utf-8 codes for the numerical range, whereas charmap simply jumps from U+017F directly to the next actually defined char, U+018F. The webpage chars that are appearing as blocks are bogus looking because they are bogus values.
posted by nomisxid at 10:48 AM on June 13, 2006


Here's an interesting resource on Unicode in HTML, if you haven't seen it already:

Alan Wood’s Unicode Resources
posted by VulcanMike at 12:03 PM on June 13, 2006


Language packs include downloading of special fonts. For instance, if you install the "Traditional Chinese" language pack, the download is about 15 megabytes, most of which is for special font files.
posted by Steven C. Den Beste at 1:55 PM on June 13, 2006


Despite your title, this has nothing to do with UTF-8. You're not encoding the characters as UTF-8; you're encoding them as entities. IE translates those entities into characters from a specific font. Not every font has every character. You probably need to specify more fonts or font families in your CSS, so IE knows what to use when a character is missing from Times New Roman.
posted by scottreynen at 2:33 PM on June 13, 2006


« Older Hyper-caffeinated Shock Coffee   |   Duration of paxil side effects? Newer »
This thread is closed to new comments.