What is the best way to implement multi-lingual capability on a site?
May 25, 2004 12:28 AM   Subscribe

I'm looking for advice on creating and maintaining a large multi-lingual website--or rather, a large English-only site that's going multi-lingual soon, once we start capturing our users' language preferences via a self-registration form and putting that into a cookie. Or is it better to detect the browser's settings? Or something else? Any book suggestions? First-person accounts? Suggestions of great websites that implement multi-lingual capability in novel or especially nice ways?

FYI, we use a proprietary CMS to output a giant XML tree, which we manipulate via XSLT stylesheets, then pull them into a JSP and/or build the whole thing with various modules in Vignette. (Don't ask.) But my point is that getting the data into XML form at the outset is not a problem, if that helps answer the question any. Also, our user base are all registered users; you have to login to use the site. Which is why I'm thinking profile-based/cookie-based language choices, rather than browser-based.
posted by Asparagirl to Computers & Internet (10 answers total)
Have you looked into Drupal? It is all Unicode/UTF-8 compliant, so it will support all languages natively. I think the developers are Belgian university students, but it is all GPLed, so you can tweak it if need be.
posted by gen at 12:44 AM on May 25, 2004

Btw, don't waste your time with phpnuke or postnuke. Big headaches if you want to support multiple languages.

One of the coolest sites that I know that does support many languages is Tony Laszlo's Issho.org (which looks to be having a few issues on the home page, but in general is a very accessible site in many languages.) Watch it change languages when you select a diffenent "language interface." You can also set it to show you only the articles in the languages you can read, etc.
posted by gen at 12:47 AM on May 25, 2004

i spent a horrible year working on a muti-lingual site. it was an interface to a database, and allowed interactive editing of information within the database. everything was given a code and then a lookup table was used to convert that code to a given text, depending on language (and a whole pile of other things, like the companies involved, the product in question etc). pages were constructed dynamically from the database to show the information required, based on templates in jsp.

the main problem was efficiency/generality - we needed to cache the translations for common combinations of parameters because the system was way too general (which made it slow and supported millions of combinations of parameters that were never used).

i believe the general approach was sound (in turned out we had some competitors who were a much bigger company, apparently, that had managed much less, so it seemed like we were doing things right). with more experience, or initial requirements that matched what the client wanted, it could have been simplified to a reliable system. but when i left, it was way too complex, and the client was still changing requirements while screaming about it not being ready yesterday.
posted by andrew cooke at 8:02 AM on May 25, 2004

oh, to answer the question, the user selected language at the start and we stored it in the session context.
posted by andrew cooke at 8:16 AM on May 25, 2004

There's actually a HTTP header Accept-Language which specifies which language the user would prefer, and what order of preference other languages should have should the first choice not be available. All modern browsers send this header and it's easy to configure; Still, hardly any sites actually use it.

(I wish google would switch user interface language based on what language the user requests instead of the guessed location of the IP the request comes from)
posted by fvw at 8:16 AM on May 25, 2004

It's a minor point, but make sure you have a link to switch languages on every page of the site. It'll help people who come into the site via deep links, and you'll be thankful when you're checking the translation.
posted by fuzz at 10:40 AM on May 25, 2004

Thanks for the help and suggestions so far. I'm just really surprised there isn't more information out there on best practices for this sort of thing.

Oh, and while doing some related web searches at work today, I came across the Library of Congress' list of official two-letter and three-letter language codes. It seems that there is actually a language called Mongo. And its language code is LOL. I am not making this up.
posted by Asparagirl at 12:18 PM on May 25, 2004

A standard book on the topic is Beyond Borders: Web Globalization Strategies. Much of its advice is correct, nearly all of it is Windows-specific, and when it's incorrect, it's flagrantly and imaginatively so (as in its oft-stated claim that all Canadian Web sites must be bilingual by law). I have pages and pages of notes and corrections on this book, which I decided were not worth the time (which I would never get back) to actually post.

Read it, but expect it to be wrong in significant respects.
posted by joeclark at 2:34 PM on May 25, 2004

I was a programmer on a government portal which had to be bi-lingual (English and Māori). We used Apache Cocoon which is a brilliant architecture in XML, XSLT, and Java so it might suit your JSP. Cocoon is the best software I've used in the last few years. It makes XSLT so much easier. But anyway,

For designing our language URIs we used domain.com/ for English and domain.com/mi/ for Māori but in hindsight it would have been far simpler to have /en and /mi and use the the root of the site for a redirect based on the Accept-Language header with an English default.

The longest discussions we had though were about equivalence between the languages. When you have /mi/trains the equivalent English content would naturally be at /en/trains/. But if you had /mi/configure-browser-to-display-maori then there may be no English equivalent to link to (no reason for the content to exist, English characters aren't a problem). And when you've got content that's not yet in māori do you just not link it on the māori site, even though the reader may be perfectly able to read the English version if linked and may want the page? Having mirror sites in English and māori was a bad idea, but it was better than maintaining two completely separate sites. They didn't have enough māori content to warrant a māori site (just menus and a few "about" pages) but they wanted to be able to add māori as they went.

So what we did was to have /mi/ only a preference for content, which meant you'd often see English on the Māori site and rarely Māori on the English. The pages were arranged hierarchically into boxes and each box could be any language (so the news would be english, the menus and tabs in māori).

Surprisingly there have been few complaints.
posted by holloway at 3:05 PM on May 25, 2004

Google.com looks at Accept-Language if you've got no cookies (works for Spanish, anyway).
posted by holloway at 3:21 PM on May 25, 2004

« Older Environmentally safe weed killer for use near...   |   Possible to get allergies in just one eye? Newer »
This thread is closed to new comments.