It's a small world
October 8, 2009 4:36 PM   Subscribe

Automatic language translators: which is the best, for now? What efforts are underway to improve them? How much progress have we seen, on what sort of time scale? My interest pertains to Latin script languages, but I'm curious about the matter generally.
posted by woodway to Writing & Language (7 answers total) 1 user marked this as a favorite
 
I got good answers to a similar question some months back, but most of the response focused on how poorly these things translate!
posted by Pomo at 4:47 PM on October 8, 2009


Dealing with different scripts is really the very smallest problem here. If you're working with handwritten input, that can be tricky, but modern computers can deal with Arabic or Cyrillic or CJK characters or what have you just as easily as they can with the Latin script.

If you ask me, the biggest problem is the amount of real-world knowledge that would be required to do machine translation right. Natural language is deeply, deeply ambiguous. Most of the time we don't notice the ambiguity, because of all the meanings a sentence could have, only one of them is consistent with common sense. But a computer doesn't have common sense; it doesn't know which interpretations of a sentence are plausible and which ones aren't. All it knows is that you've given it a sentence that could mean seventeen different things, each of those meanings would be expressed a different way in Chinese, and it has to pick one. Unsurprisingly, it often picks wrong.

(There are a lot of efforts to compile the sort of real-world knowledge that a computer would need to do this stuff right. They've achieved some impressive partial successes. They're also still nowhere near good enough to cope with, say, the amount of common sense knowledge assumed by an ordinary newspaper. We've got our work cut out for us.)

In general, anything being done in computational linguistics can be used for machine translation. Very few computational linguists will actually come out and say they're working on machine translation. This is partly for historical reasons (they predicted back in the 50s that they'd have it within ten years; they failed, their funding got cut, and everyone was sort of humiliated by the whole thing) and partly due to simple modesty (telling people you're working on machine translation is like saying you've decided to cure cancer). But look at it this way — before we can get machine translation right, we have to solve just about every other problem in comp ling first.
posted by nebulawindphone at 5:52 PM on October 8, 2009


There are some significant problems with automatic translators, despite the huge amount of work being done and advances made. We still can't fully account for the rules of English syntax. Folding in meaning just presents more problems. For example, the following two sentences are grammatical:

The horse ran past the barn fell.
Colorless green ideas sleep furiously.

Then there are all the sentences, phrases, and speech we use everyday...most of it ungrammatical, but also completely understandable.

These two things – grammaticality and semantic content – present a couple of the many problems with automatic parsers/translators. We've got a long way to go. Then again, it's absolutely AMAZING how far we've come.
posted by iamkimiam at 6:11 PM on October 8, 2009


Best answer: The European Commission has to do a lot of translation work to support its 23 official languages. Some of the work is automated by machine translation and then completed by human translators. The EC mostly uses SYSTRAN.

The EC has a great page explaining the workflows and tools its uses. Hilariously, "these articles are only available in their original language."
posted by jedicus at 8:57 PM on October 8, 2009


Response by poster: That is funny. Yes, I was wondering about programs under the umbrella of the UN, African Union, APEC, ASEAN, EC, OAS et cetera. I'm also curious about AI research, and studies of second-language learners. I know very little about the subject, but I assume that R&D efforts are being devoted towards improving automatic translation. Private industry has a strong incentive for commercial success, of course. I thought MeFites might be involved one way or another. I appreciate your insights.
posted by woodway at 5:28 AM on October 9, 2009


"The horse ran past the barn fell."

Don't you mean "The horse raced past the barn fell"? With "ran" it is ungrammatical (should be "run" in that case).
posted by kosmonaut at 7:28 AM on October 9, 2009


I'm a translator. Mostly we point and laugh at machine translation. That said, there are some narrow areas where it is very successful. Apparently Canada has excellent French<>English machine translation for weather reports. A big part of the success there is that they're using a tightly controlled vocabulary: there's really no room for ambiguity, and every possible statement is known in advance, so it's largely a matter of looking up corresponding phrases.

Most translation isn't like that, of course. There are two general approaches to MT: corpus-based and algorithmic. Corpus-based MT relies on having a huge dictionary of phrase equivalents, so it's like that Canadian weather-report translator, on a larger scale. Algorithmic is what it sounds like: trying to diagram sentences, doing dictionary work on the elements, trying to find a corresponding sentence structure in the target language, and putting the pieces back together. Most MT these days is using some mix of the two, I think.

I recently ran across a web page that was originally written in Swedish and had been run through Google Translate. I didn't realize this at first. The first sentence sounded fine. The second one was clearly off, and the third one didn't make any sense at all. That's when I noticed it was the product of MT. When you venture outside of controlled language, you get problems of ungrammatical writing, inventive writing, slang, euphemisms , hints, unmentioned factors that the reader and writer both understand, and context. Consider the phrase "confirmed bachelor" when it appears in an obituary. Indeed, obituaries have a vocabulary all their own. In order for MT to do justice to one, it would need to recognize "oh, I'm working on an obituary. Get out the obituary phrasebook." These are the sorts of things that human translators struggle with (not that there's a lot of work in obituary translation). I would not be sanguine about MT catching up.

Consider the two following sentences:
The pen is in the box.
The box is in the pen.
These are both completely grammatical, meaningful sentences. But in order to make sense of them, you need to be able to apply a level of analysis to them that no computer can yet. There's been a "common sense" AI project called Cyc running for decades that would, in theory, imbue a computer with the ability to understand the two different meanings of "pen" in the examples above (as would a project discussed in this recent MeFi post), but as should be obvious, teaching computers common sense is a herculean undertaking.

For the most part, MT these days is useful for letting you know what a text is about in a general way. It's not so useful for letting you know what the text actually says.
posted by adamrice at 8:46 AM on October 9, 2009 [1 favorite]


« Older Computer restarting   |   Is there raw meat that won't make you sick and... Newer »
This thread is closed to new comments.