Multilingual font translation issues
July 6, 2010 6:31 PM   Subscribe

Multilingual (non-)Unicode translation issues inside, from someone who doesn't know nearly as much about Unicode as she should.

I'm producing a set of brochures in a number of languages including Punjabi and Tamil, and was planning on using the Mac fonts Gurmukhi and InaiMathi to typeset them in InDesign. The content I received from the client was all typed in non-Unicode .ttfs (ChattrikKalmi and Bamini), which I also have loaded on my system. They view fine in Word, but won't copy-paste to InDesign. Is there any way I can get the Basic Latin content I have to properly translate into the right Unicode characters? If I can't do it on my end, how do I explain what I need the client to do in the clearest and simplest possible terms?
posted by avocet to Computers & Internet (3 answers total)
 
in Armenian we encounter this a lot. We have translit.am to do the conversion.
posted by k8t at 8:02 PM on July 6, 2010


Best answer: http://www.islamkalvi.com/web/bamini2unicode.htm
http://www.suratha.com/reader.htm
Looking at my locale stuff on Linux it looks like the likely encodings if they're not Unicode would be GB18030 ans TSCII, the only ones that have matches for 'tamil' and/or 'gurmukhi'. Under Linux with plain text files I would try using "iconv -f TSCII -t UTF-8 -o out.txt in.txt" and cross my fingers.

I did get good results just googleing "tamil to unicode converter" type queries for various permutations of (tamil, tscii, gb18030, bamini, etc.) but since I can't read them, I don't know if they'll help any. Good luck, I've spent days mucking about with encoding issues before, it can be a pain in the butt.
posted by zengargoyle at 8:14 PM on July 6, 2010


Best answer: Here's the long-form answer:

If you're dealing with Indic characters, just let it be known that Unicode is an encoding for _letters_ , (अ, आ, क etc in Devnaagri), but not _glyphs_ (का, की etc) Indic languages effectively deal with glyphs, "conjunct consonants" ("aadha akshar" in Hindi) that are formed by combining consonants and vowels and so on.

So metafilter wouldn't necessarily be a letter-for-letter transliteration of m.e.t.a.f.i.l.t.e.r in Gurmukhi, like it would be for European scripts like Cyrillic or Greek; it would be me. Ta. fil. Tar., which would roughly be ਮੇਟਾਭਿਲ੍ਟਰ if I get my Gurmukhi correctly [1]. Or மெடாபில்டர் in Tamil.

What this means is this: while Unicode has a standard for all the 48+16 letters in Brahmi-based scripts (all of which will roughly have the same set, save a letter here or there), it doesn't have a standard for all the (48+16 * 48+16) combinations[2]. Having a standard for all glyphs is crucial in two respects:- first, your fonts will be dependent on that, and second, it's crucial that your OS be able to render the glyphs correctly.

The closest we have to a standard on this is OpenType; it's supported fully by both Windows and Linux. Windows has a rendering engine called Uniscribe that does this right (as you can see in the page though, not all versions support all languages; support has been progressive and could involve Service Packs etc, rather than being out-of-the-box)

If you're on a Mac, you should know that Apple doesn't use OpenType at all; it uses AAT instead. While it's possible to have a font that's both OpenType and AAT-compatible, but it's possible that some AAT fonts aren't compatible with OpenType.

Now for the bad news. First, a quick googling tells me that Bamini[3] doesn't use Unicode at all (here's the right InScript keyboard layout, considerably different from the earlier link); like many Indian fonts, the creator used her own non-generic standard to come up with the keyboard layout. It's possible that the font uses encoding reserved for ASCII characters to render Tamil characters. Zengargoyle's first link converts Bamini to Unicode, but the presupposition here is that you have a Unicode-friendly OS / font already installed; at best, it can help you convert Bamini to perhaps InaiMathi.

Problem here being, that won't be enough for you. Seems like InDesign (whether on Mac or Windows) doesn't officially support Indic character rendering; it's quite possible that you may have problems despite the aforementioned conversion. The solution, in which case, would be to take a screen-capture in Word or PDF and then use that as an image in InDesign. This would suck _a lot_ for typesetting / formatting etc, but that's the best you can do here.

How you'd tell the client: at its simplest crux, the problem you'll have is that InDesign doesn't (possibly) support Punjabi or Tamil. That's the core of what you need to communicate to the client :)

In order to buy some credibility though, you may want to talk about the rest of the stuff I mentioned here, skimmed appropriately.

(I'm not up-to-scratch on Mac compatibility, but feel free to sound off any questions on linguistic matters)


[1] - Gurmukhi is Punjabi's script; Devnaagri is Hindi's script
[2] - Actually, the total number of combinations could be much more than that; conjunct consonants can be formed by combining more than two consonants as well.
[3] - A beautiful name for a font; "Bhamini" literally is a damsel, usually used figuratively to refer to the god Krishna's consort, Satyabhama, beautifully captured in a Kuchipudi opera, Bhamaa kaLaapam.

posted by the cydonian at 12:21 AM on July 7, 2010 [1 favorite]


« Older What is this mass of white bugs?   |   Where does the Humber River trail goto??? Newer »
This thread is closed to new comments.