How can I learn to recognise more languages?
April 12, 2007 9:07 PM Subscribe
How can I learn to recognise more languages?
I'm a literate English-speaker who knows more than a smattering of French and I can usually guess the language of texts written in most Western European/SE Asian/Programming scripts, but I'd like to improve. I recently received an email in Turkish and I had no idea what language it was, which made it difficult to find a machine translation. How can I train myself to recognise more languages and hence find resources for translation?
I'm a literate English-speaker who knows more than a smattering of French and I can usually guess the language of texts written in most Western European/SE Asian/Programming scripts, but I'd like to improve. I recently received an email in Turkish and I had no idea what language it was, which made it difficult to find a machine translation. How can I train myself to recognise more languages and hence find resources for translation?
You can browse the selection of translations of the Lord's Prayer to get an example of what an actual text in different languages and writing systems looks like.
posted by greatgefilte at 9:25 PM on April 12, 2007 [1 favorite]
posted by greatgefilte at 9:25 PM on April 12, 2007 [1 favorite]
Read any and all multilingual instructions when you buy appliances or use facilities with international signage. Observe the family resemblances in the Scandinavian and Romance languages. Compare German and Dutch.
Learn the Greek and Cyrillic alphabets.
Hit your public library and read all the popular linguistics books.
posted by i_am_joe's_spleen at 9:27 PM on April 12, 2007
Learn the Greek and Cyrillic alphabets.
Hit your public library and read all the popular linguistics books.
posted by i_am_joe's_spleen at 9:27 PM on April 12, 2007
As a more practical exercise, you would also do well to get a book on writing systems, such as this fine volume.
posted by greatgefilte at 9:28 PM on April 12, 2007
posted by greatgefilte at 9:28 PM on April 12, 2007
Oh, learn to distinguish the various Asian writing systems from one another. Only Korean uses Hangul, only Japanese uses hiragana and Katakana, only the Indian languages use Devanagari, only Thai uses Thai script, and so on. There will be books in your public library on alphabets and writing systems.
posted by i_am_joe's_spleen at 9:29 PM on April 12, 2007
posted by i_am_joe's_spleen at 9:29 PM on April 12, 2007
Ha! Good one, greatgefilte.
posted by i_am_joe's_spleen at 9:29 PM on April 12, 2007
posted by i_am_joe's_spleen at 9:29 PM on April 12, 2007
Exposure and practice. Getting the script is relatively easy. Very few exist in mainstream use. Then you're stuck.
Worth noting from Benjamin Nushmutt's comment that listening to a language will teach you nothing intrinsic about how a language is written - that's an arbitrary system with no first principle basis. (It may sound like another similar one which may indicate a similar script.)
posted by TrashyRambo at 9:30 PM on April 12, 2007
Worth noting from Benjamin Nushmutt's comment that listening to a language will teach you nothing intrinsic about how a language is written - that's an arbitrary system with no first principle basis. (It may sound like another similar one which may indicate a similar script.)
posted by TrashyRambo at 9:30 PM on April 12, 2007
iajs, the section on constructed languages is quite interesting.
posted by greatgefilte at 9:37 PM on April 12, 2007
posted by greatgefilte at 9:37 PM on April 12, 2007
True, TrashyRambo. I was thinking about hearing a language + seeing the album packaging and liner notes (and kinda neglected to mention that detail).
posted by Benjamin Nushmutt at 9:39 PM on April 12, 2007
posted by Benjamin Nushmutt at 9:39 PM on April 12, 2007
only the Indian languages use Devanagari
I'm reasonably sure that only Hindi uses Devanagari. The other 13 or so major Indian languages each have their own scripts (although Urdu & Kashmiri use the Arabic).
/pedantic
posted by UbuRoivas at 11:15 PM on April 12, 2007
I'm reasonably sure that only Hindi uses Devanagari. The other 13 or so major Indian languages each have their own scripts (although Urdu & Kashmiri use the Arabic).
/pedantic
posted by UbuRoivas at 11:15 PM on April 12, 2007
Tibetan is based on the devangari too, which might throw you.
posted by Abiezer at 11:59 PM on April 12, 2007
posted by Abiezer at 11:59 PM on April 12, 2007
(yeh, based on, but clearly different. until you get down into the dravidian language areas, most of the scripts look pretty similar to devanagari. gujarati is one example that springs to mind. but i was just nitpicking)
posted by UbuRoivas at 12:13 AM on April 13, 2007
posted by UbuRoivas at 12:13 AM on April 13, 2007
UbuRoivas:
posted by Aidan Kehoe at 2:25 AM on April 13, 2007
“Devanāgarī (देवनागरी) is an abugida script used to write, either along with other scripts, or exclusively, several Indian languages, including Sanskrit, Hindi, Marathi, Sindhi, Bihari, Bhili, Marwari, Konkani, Bhojpuri, Nepali, Nepal Bhasa from Nepal and sometimes Kashmiri and Romani. ”Though of course there’s a lot of language-or-dialect to and fro over some of those. (Not Sanskrit, though, ha!)
posted by Aidan Kehoe at 2:25 AM on April 13, 2007
If you're looking at Unicode, one shortcut is to learn which Latin Extended characters come from which language. Ğ is a tipoff for Turkish or a related language, ł and ż suggest Polish, ő and ű are dead giveaways for Hungarian, ể can't be anything but Vietnamese, and so on.
Sometimes you will find a certain character is used throughout a region — č is found in a whole cluster of languages from Eastern Europe, for instance. More annoyingly, some characters are reused in totally unrelated languages — ł can be a tipoff for Navaho or a Pacific Coast language as well as for Polish.
The first section of the Wikipedia list seems to be based on this trick, and it's probably a good place to look for "giveaway" characters like these.
posted by nebulawindphone at 5:49 AM on April 13, 2007
Sometimes you will find a certain character is used throughout a region — č is found in a whole cluster of languages from Eastern Europe, for instance. More annoyingly, some characters are reused in totally unrelated languages — ł can be a tipoff for Navaho or a Pacific Coast language as well as for Polish.
The first section of the Wikipedia list seems to be based on this trick, and it's probably a good place to look for "giveaway" characters like these.
posted by nebulawindphone at 5:49 AM on April 13, 2007
I like to think I'm decent with language recognition on a non-essential basis, and I picked most of it up per IAJS's first suggestion, as a kid. I'd like to thank the snow-tube people for having 20 languages on their product. Electronics often have the same thing, usually with a two-letter country code preceding the text.
On a more delicious basis, weird imported foods often have ingredient lists on them in the same way. I've bought chocolate that had seven or eight languages on it and I've got some yoghurt in the fridge labeled in Turkish, Arabic, Farsi, and Armenian.
posted by cobaltnine at 6:00 AM on April 13, 2007
On a more delicious basis, weird imported foods often have ingredient lists on them in the same way. I've bought chocolate that had seven or eight languages on it and I've got some yoghurt in the fridge labeled in Turkish, Arabic, Farsi, and Armenian.
posted by cobaltnine at 6:00 AM on April 13, 2007
Another good resource, albeit for scripts more than languauges, is omniglot.
posted by Schismatic at 7:27 AM on April 13, 2007
posted by Schismatic at 7:27 AM on April 13, 2007
Try this language identification doohickey I built:
http://ruphus.com/identify
posted by snifty at 5:35 PM on July 18, 2007
http://ruphus.com/identify
posted by snifty at 5:35 PM on July 18, 2007
This thread is closed to new comments.
posted by Benjamin Nushmutt at 9:23 PM on April 12, 2007