HTML Language Support
June 29, 2004 1:21 AM

I need to make a web page that has 12 different languages, on one page (inc. some right-to-left languages). It's links to customer guides (PDFs) in the different languages - and rather than list them in English (Arabic, bengali, Hindi, Punjabi..) I'd like each language name to be written itn its own script. Can anyone point me to a list of langauge names as text not images, and give me a heads-up on the HTML code required?
posted by Pericles to Technology (10 answers total)
I might be wrong but I think you'll need to use frames or iframes pulling in individual documents as each language will need a different charset meta tag in order to display properly.

BBC World Service might be useful. The only time they display the languages together they use images.
posted by echelon at 5:10 AM on June 29, 2004


Well, at the bottom of Debian's home page, they have a list of 20-something languages, some of them using different scripts, in text (including Arabic).

Also, Wikipedia has a long list of languages, with even more right-to-left ones, on their homepage near the bottom. I don't know the HTML required, it seems that both of them used numbered-enties (probably Unicode, I don't know). But, I wouldn't be surpised if W3 has at least some info.
posted by skynxnex at 5:40 AM on June 29, 2004


I've discovered the names of the languages thrun omniglot - for example, Greek is ???????? - but some languages (Hindi, Bengali,Arabic, Chinese, Gujerati) are images rather than HTML escape sequences. Any one know? (Languagehat?)
posted by Pericles at 5:41 AM on June 29, 2004


Multiple languages can be used on the same page if you use an Unicode encoding like UTF-8 that has support for all (or most characters in most languages). Readers of your page would still have to have the proper fonts installed to see the characters properly. I am not so sure about switching reading order midstream (i.e left-to-right then right-to-left), but I got to think that the Unicode Consortium had to have considered this one of its use cases.
posted by mmascolino at 5:42 AM on June 29, 2004


Use Unicode. Save the page in a UTF encoding (usually UTF-8) and make sure the page is served with the right Content-Type charset parameter (or use a meta tag if you don't have access to the server).

All the time, I mean, regardless of whether you actually need multi-language text or not. Non-UTF encodings are so last century.

If you are unfortunate enough to use a text editor that is Unicode-ignorant, you can of course use character references (eg. ሴ) to put the relevant numbered characters in place in an otherwise plain-ASCII document (See http://www.unicode.org/charts/ for the mind-numbingly full list of characters). This is a good solution if you just need a few words of text in each different script, but not so good for editing an entire page of Japanese.

Unicode will automatically deal with the situation where you have some left-to-right text embedded in a right-to-left sentence (and vice versa), but for Arabic and Hebrew you have to explicitly tell the browser which elements to start off in right-to-left text, right-aligned mode, for some reason. This is done using the dir="rtl" attribute.
posted by BobInce at 6:05 AM on June 29, 2004


Thanks everyone! I eventually followed Skynxnex's advice and discovered that searching Wikipedia for a language name would show that language's name in its own script somewhere on the page; viewing source and copying the escape characters does the trick, except for those for which I don't have the font.

BobInce - what do you mean by making sure that "the page is served with the right Content-Type charset parameter"? Do you mean I should put span lang="xx" around the escaped characters?
posted by Pericles at 6:30 AM on June 29, 2004


Charset and lang are two different things. Charset is, well, the characters. French, German, and English are all written with ISO-8859-1, but they're different languages. In any case, you'll be using "charset=utf-8" because it is the Right Thing To Do.

Wrapping text in tags with 'lang="xx"' tags is a separate issue. It's semantically nice, and not a bad idea (and remember, you don't need to nest a span inside another element if the only thing in that element is the text in that language), but there's not a whole lot of technology these days that takes advantage of it.
posted by adamrice at 8:28 AM on June 29, 2004


Oh yeah, sorry: The charset is declared in your page's HEAD element as

[meta http-equiv="Content-Type" content="text/html; charset=utf-8" /]

(changing square brackets to angle brackets, of course). Or it is sent by the server as something you declare in your .htaccess file.
posted by adamrice at 8:31 AM on June 29, 2004


Any one know? (Languagehat?)

Sorry -- languages I know; coding is beyond me. (Languagehat, the blog, exists thanks to the kindness of techie friends.) But I wish you all the best!
posted by languagehat at 8:42 AM on June 29, 2004


appreciate it, everyone
posted by Pericles at 12:09 PM on June 29, 2004


« Older Need a Perl or PHP script that does what del.icio...   |   Is there a tool to strip out silent places in a... Newer »
This thread is closed to new comments.