How do you deal with Chinese characters that can't be represented in 16 bits?
April 4, 2008 8:48 AM
Subscribe
How are people dealing with >16 bit Unicode code points? Specifically, in languages like Java, C# and C++, which assume 16 bit characters (I believe), how are you supporting
GB 18030? I would suspect that the various languages' methods like substring(), charAt(), operator[], etc can't be safely used in China. If your wstring, say, contains a Chinese string, then .size() doesn't tell you how many characters are in it, right?
On a related note, what interesting Chinese characters require more than >16 bits? I'm thinking about making a short presentation for my co-workers on this subject and I'd like to have some interesting examples.
(Oh, and I'm going to run any examples by my Chinese colleagues first, so don't bother trying to make me say "penis" or something in front of my co-workers :-))
posted by bonecrusher to computers & internet (6 comments total)
4 users marked this as a favorite
posted by tachikaze at 9:05 AM on April 4, 2008