Identifying English-as-a-Second Language speakers' first language?
October 27, 2015 6:28 AM   Subscribe

With a long-enough sample of written or spoken* English from someone whose first language is not English, can linguists identify what that person's first language is? If so, what are the tell-tale signs for various languages?

* Ignoring any accent, as if the linguist were working with a transcript.
posted by alby to Science & Nature (31 answers total) 10 users marked this as a favorite
Learner English might be useful for trying to do this:

"This updated edition is a practical reference guide which compares the relevant features of a student's own language with English, helping teachers to predict and understand the problems their students have. Learner English has chapters focusing on major problems of pronunciation, grammar, vocabulary and other errors as well as new chapters covering Korean, Malay/Indonesian and Polish language backgrounds."

Actual linguists might know more advanced techniques.
posted by curious_yellow at 6:36 AM on October 27, 2015 [2 favorites]

anecdotally (sorry), russians omit "the" and spanish speakers (and perhaps other languages with latin roots?) get possessive pronouns wrong ("his" instead of "hers").

poking around on google turned up this paper whose abstract i'll give below (slow pdf download):
Attempts to profile authors based on their characteristics, including native language, have drawn attention in recent years, via several approaches using machine learning with simple features. In this paper we investigate the potential usefulness to this task of contrastive analysis from second language acquistion research, which postulates that the (syntactic) errors in a text are influenced by an author’s native language. We explore this, first, by conducting an analysis of three syntactic error types, through hypothesis testing and machine learning; and second, through adding in these errors as features to the replication of a previous machine learning approach. This preliminary study provides some support for the use of this kind of syntactic errors as a clue to identifying the native language of an author
posted by andrewcooke at 6:44 AM on October 27, 2015 [1 favorite]

My own observations, which are at least 50% likely to have something to them:

Putting a comma after "I think" or "I believe" (e.g. "I think, that the Mets are going to win the World Series") is a good sign that the writer's first language is Germanic.

Russians tend to leave off articles.
posted by dfan at 6:45 AM on October 27, 2015 [2 favorites]

can linguists identify what that person's first language is?

I am a linguist and I could not do this. (I'm just sticking this in here to point out that I think there may be some confusion about what it is that linguists do.)

If so, what are the tell-tale signs for various languages?

I'm not really sure there are "tell-tale" signs, especially if you're eliminating phonology/phonetics from the picture. But there are systematic patterns of errors that can be seen in aggregate when looking across speakers, I think the most relevant kind go under the heading of L2 transfer errors, which amount to pretty much what you'd expect: using grammar from your L1 in the L2. In English for example, one place this manifests is preposition choice (esp. for particle verbs) where L2 learners just have to memorize a mess of idiosyncratic stuff that is different from their native language if they want to get it right. Another case is the choice in English between the simple past and the present perfect, which is different in many other languages. So in a pinch they fall back on some direct translation of what their native language would do, which is probably wrong.

There are also computational models that attempt to do this.
posted by advil at 6:46 AM on October 27, 2015 [7 favorites]

Anecdotally, Russians mix up "that" and "what," or sort of use them interchangeably. In Russian it's the same word, "что."

which is one of the few Russian words I know. I'm learning!
posted by millipede at 7:00 AM on October 27, 2015

If the speaker is making really obvious mistakes with the gender of pronouns, I would assume they speak a language that doesn't use gendered pronouns. I would probably guess Hungarian since my grandma used to do this all the time with my brother and me, but other languages have a similar system. [I'm not a linguist.]
posted by andoatnp at 7:01 AM on October 27, 2015

There's apparently a database of language properties you can explore:
The World Atlas of Language Structures (WALS) is a large database of structural (phonological, grammatical, lexical) properties of languages gathered from descriptive materials (such as reference grammars) by a team of 55 authors.
posted by stefanie at 7:06 AM on October 27, 2015

With a long-enough sample of written or spoken* English from someone whose first language is not English, can linguists identify what that person's first language is?

Only if the text excerpt 1) displays actual grammatical residues from that language, or
2) if the text displays cultural quirks (as, in this very example, the German's love for "1) 2)")

The most conspicuous tell-tale signs of the first kind for German in relationship to English are: too long sentences with too many asides or derails; a tendency to write "is doing" instead of "does" (etc.); capitalisation of "you" in emails and letters; weird placement of verbs.

The clearest German tell-tale sign of the second kind is a lack of logical connective tissue between sentences in a row. Is a follow-up sentence an illustration of the previous one in a single logical flow? Does it show a difference facet of the same thing? Is it a contrasting statement? Is it not connected at all? German writers have a fantastic ability to just omit the stuff that ought to show these connections. German readers are trained to guess at these connections. German editors or translators don't even... in other words, it drives me bananas.
posted by Namlit at 7:12 AM on October 27, 2015 [1 favorite]

I'm an ESL teacher. I don't think I've ever ID'd someone's native language by reading their writing before, but I don't often read the writing of people whose native language I don't already know.

People of all language backgrounds make pronoun mistakes--in many languages the pronoun is not required because the verb gives the person and number (unlike in English), so speakers of those languages may omit pronouns entirely. Languages like Chinese and Japanese don't have plural nouns (you don't just add something to a noun to make it plural, as I understand it--I don't speak either) so those speakers often make mistakes there.

Directly translated idioms might be a pretty obvious indicator of a specific language.
posted by chaiminda at 7:14 AM on October 27, 2015 [1 favorite]

Previously on Ask MetaFilter Two shirt for twenty dollar

1. What is it that makes particular language speakers make particular errors? I'm guessing in the above examples that it's something to do with the way Russians use articles, and Chinese pluralise (though the lack of "s" doesn't seem to affect Italians, for example).

2. More examples, please.

posted by andoatnp at 7:23 AM on October 27, 2015 [1 favorite]

What level of non-proficiency are you thinking of? Probably 3/4 of my work interactions are with folks for whom English is not a first language (this is in the US). Some of them can school me on proper grammar. Others come to me for help.

In addition to Russian speakers, Korean speakers also have trouble with articles that Romance/Germanic/Arabic/South Asian language folks don't seem to. I bet there are a bunch of other languages for which that happens. If the person uses UK English rather than US English, they usually learned in India (though I know that encompasses a bunch of languages), though that's more cultural than linguistic. I bet you could do some analysis by word roots and find that Romance language speakers use a higher percentage of Latin-root English words, since I know I do the same with French or Italian.
posted by tchemgrrl at 7:40 AM on October 27, 2015

Response by poster: What level of non-proficiency are you thinking of?

Difficult to quantify, really. Someone whose speech/writing is immediately identifiable as ESL, but not to the extent that it's difficult to understand the meaning behind it.
posted by alby at 7:44 AM on October 27, 2015

I have occasionally identified francophones writing in English by their misuse of false friends, like one person who wrote “an inconvenient” when they meant “a disadvantage.” Sentences with lots of dependent clauses and comma splices also give me a strong suspicion that the writer is a native French speaker. Other common errors involve prepositional phrases, like “I’ve been here since a while” instead of “I’ve been here for a while.”
posted by mbrubeck at 8:20 AM on October 27, 2015 [2 favorites]

An easy way to identify native German speakers speaking English is the loan/borrow conundrum. For some reason nearly every German I know seems to get these two words mixed up.
posted by Gungho at 8:24 AM on October 27, 2015 [2 favorites]

You can narrow down to continent, at least, based on the way they write their numbers. Especially the numeral 1, which Europeans like to write in a way that looks almost like a 7.
posted by dis_integration at 8:33 AM on October 27, 2015

Learner English is just what I was going to suggest. About every 2 months, I actually assess essays by English learners, and I often try to guess their language background before I check their names. I find I can identify some groups fairly easily and consistently; for example, French speakers tend to have different handwriting, use more Latinate academic words, and use certain Latinate words incorrectly. But for languages with similar grammar patterns, it can be impossible (I won't have a clue between different Eastern European languages, and even guessing between Korean and Japanese can be difficult unless they have some pretty specific errors or romanization issues). Obviously, the more advanced a learner is, the more difficult it is.

Here are some common giveaways, at least in handwritten essays:

- Arabic speakers often have phonetic spelling for difficult words.
- Phrases like "I very not like" usually indicate Chinese speakers. There are also likely to be a lot of set phrases/cliches like "In this fast-paced and modern world" or "Every coin has two sides."
- Japanese-English words like "illust" are a dead giveaway for Japanese speakers, but Korean shares some of these words. There are also mistakes like "most of things are cheap" and "almost people like it." Misuse of passive is also pretty common, e.g. "He was died."

However, a lot of mistakes occur in multiple (unrelated) languages, including preposition errors ("I got married with him"), article errors ("This is book I borrowed"), pronoun errors ("This is my sister. He lives in LA."), word form errors ("The train is convenience"), adjective switches ("I was really boring while I watched that movie"), and even the lend/borrow thing.

Most of these occur errors occur for pretty understandable reasons: a language doesn't use personal pronouns much, a language marks the part-of-speech in every word, a language only has one word for lend/borrow, etc. So if you're intimately familiar with the speaker's language of origin, you can tease out what's happening--but that requires some thought and analysis if you're not used to doing it.

Anyway, I think it's plausible that someone very familiar with non-native writing could identify the origin of a writer, but that person would probably be an ESL/EFL instructor or an applied linguist working in English language acquisition. There's plenty of margin of error, though!

(I apologize to anyone reading this if you're a non-native English speaker and this makes you feel uncomfortable! These are just patterns, not rules, and any individual learner may avoid all of them. I could make a laundry list of my own typical errors in speaking other languages, from the stereotypical overuse of "I" in Japanese on down.)
posted by wintersweet at 8:35 AM on October 27, 2015 [9 favorites]

I'm going to nth that there is no reliable way to identify the origin of most non-native English speakers based on their English grammar mistakes in transcript form, without the added hints supplied by accents. There are simply too many variables depending on the amount and quality of English instruction and practice they have had, and, for example, whether they are spending most of their time among fully proficient English speakers, or among others speaking English as a second language. (For example, a native speaker of French who spends most of his time among native speakers of Spanish might tend to pick up the typical mistakes of the Spanish speakers rather than ones typical of French speakers.)

That said, there appear to be a quite a few compendiums of the typical English grammar mistakes committed by native speakers of various languages. Here are a few (for more, try Googling: common english grammar mistakes of [language] speakers):
(Note: some of these include pronunciation errors as well as grammar errors.)


For each of these and others, if you use the Google search term suggested above, you'll get additional links to pages of "most common mistake" lists. From language to language, the lists do tend to differ considerably. So, I suppose that if you compiled all of them and used the resulting database as a tool in analyze transcripts, you might be able to improve the accuracy of your guesses.
posted by beagle at 9:03 AM on October 27, 2015 [6 favorites]

Regarding clues in written English, note that the OP specifies the linguist would be working with transcripts [of spoken English] not written text. So handwriting clues are out, just like accent clues. [On edit: whoops, I see that the question does mention written samples.]
posted by beagle at 9:07 AM on October 27, 2015

look up shibboleth.
posted by entropone at 9:31 AM on October 27, 2015 [2 favorites]

Response by poster: Regarding clues in written English, note that the OP specifies the linguist would be working with transcripts [of spoken English] not written text.

Actually it could be either. The note about transcripts was just to eliminate recognising an accent as a clue.
posted by alby at 10:07 AM on October 27, 2015

I've noticed that (some) people from India pronounce (some) English words with a ghost of an R in many of the vowels.
posted by Bruce H. at 11:04 AM on October 27, 2015

German capitalizes all nouns so I assume there may be a "tell" if someone can't distinguish the difference between between which are considered proper (and capitalized) and regular (not capitalized) in English.
posted by _DB_ at 11:08 AM on October 27, 2015 [2 favorites]

Like mbrubeck, I also recognize native French speakers' writing if they directly translate from French or misuse words.

"A film that we had realized during our vacation" (rather than "made")
"And you, the house you buy, it is good?" (sentence inversion)
"She passes four days with us" (rather than "spends")
"We are working actually and we will eat later" (misusing "actually" as "currently")
"My younger sister is at college in our home town." ("college" in French meaning "high school")

I noticed that many Europeans commonly used the word "touristic" (as in touristy).
posted by amicamentis at 11:26 AM on October 27, 2015 [1 favorite]

I used to have a job on a TV show with a largish number of Chinese-American characters. The show has a huge cult following in China, and we got a lot of Chinese fan letters in English. Most of them had very similar "styles" of English writing. That said, we were not getting as many letters from other non-Anglophone countries, and there are a LOT of context clues in something like opening a letter. So while I could probably guess the country based on an overall impression of the letter we received, I might not do as well with only the text of the letters typed out neutrally, minus other clues.

(I'm not a linguist.)

Also, re this specific example:

"She passes four days with us" (rather than "spends")

It's also worth noting that a LOT of non-English usages have passed into American regional English due to immigration patterns. Having grown up in southern Louisiana, to pass time rather than spend it sounds completely normal to me. It sounds "country", or maybe specifically Cajun, but I wouldn't necessarily pick out someone who used that as an ESL speaker. Same for a lot of the Germanisms mentioned upthread. I wouldn't be surprised to find native speakers of English using them in parts of the US which have historic ties to the German speaking world.
posted by Sara C. at 12:42 PM on October 27, 2015

The Chinese tend to omit prepositions and indeed any other articles of speech they can, e.g. "What time bus?".

Speakers of romance languages often apply a gender to inanimate objects.

Indians have odd signature phrases like "do the needful".
posted by w0mbat at 2:02 PM on October 27, 2015

Like wintersweet, I used to assess language learners' essays (sometimes without any identifying factors like names), and after a while it was pretty easy to figure out the writer's first language. Often the rhetorical style and verbosity (or lack thereof) are identifying factors. For an old-school discussion of this, google Kaplan's article "Cultural Thought Patterns in Inter-cultural Education." (Sorry for the lack of link but the first hit on Google is a PDF of the article; I couldn't figure out how to link a PDF without dumping it into Dropbox.)
posted by bluebelle at 6:56 PM on October 27, 2015

Just from personal observation:
Spanish allows for double negatives, so something like "I don't have nothing" and "I never said nothing" is directly translated from Spanish. Spanish speakers also often say something like "when so-and-so born" instead of "when so-and-so WAS born." I don't know how to describe that in grammar terms, but they leave the "was" verb out.

Chinese (Mandarin) speakers often mix up he and she. I always guessed there is no difference in their language? But I am not sure if that is the reason.

If written language is allowed for this question, then I would say a lot of native Russian writers all have a very similar handwriting (see example here). So when someone writes in English in script and writes some of the common letters the same way as they would in Russian, I often guess correctly that they are Russian/Ukrainian. Some common shaped letters between the Russian and English alphabet include the following, if you want to be a handwriting language detective: a, g (which is a d in Russian), p (which is an r in Russian), m (which is a t in Russian), c (s), e, n (p) y (an oooh), x (h). And yes, Russian speakers often skip "the" and "a" because it is not used in the Russian language, or sometimes they confuse them, since it's not always intuitive which one to use when learning the language. Russian people often call jokes "anecdotes" (because joke translated into Russian is 'anecdot,' and when common Russian jokes are translated to English, they totally get lost in translation and usually make the listened not know what to do other than nod politely.

This might be obvious, but using words like 'lift' for elevator other British words would be a sign of a country that speaks British English or was colonized by England up until recent times. This might help you narrow down countries.
posted by at 8:22 PM on October 27, 2015

i taught ESL writing in Japanese schools for roughly ten years all told. I'm reasonably sure I could pick out an essay (with a good number of mistakes) as written by a native Japanese speaker. Article mistakes are too common, I think, to pin down to a single country, but in a language like Japanese, where the subject of a sentence is very often omitted (among other things) there are some continued quirks that pop up.
posted by Ghidorah at 8:25 PM on October 27, 2015

Germans often use Also where native speakers would use Too.
posted by kjs4 at 9:48 PM on October 27, 2015

Spanish speakers also often say something like "when so-and-so born" instead of "when so-and-so WAS born." I don't know how to describe that in grammar terms, but they leave the "was" verb out.

This is not a feature of Spanish. Pronoun subjects in a sentence can often be omitted because the verb conjugation provides the context ("I eat meat" vs. "Como carne", where "como" means "I eat" all on its own), but you really do need the verb even in cases where you're using a being verb like "was", "am" etc.

Everyone answering this question should be careful not to generalize American dialect and slang for common writing errors of ESL speakers, which are two very different things.
posted by Sara C. at 10:55 PM on October 27, 2015

Spanish speakers also often say something like "when so-and-so born" instead of "when so-and-so WAS born." I don't know how to describe that in grammar terms, but they leave the "was" verb out.

This is not a feature of Spanish. Pronoun subjects in a sentence can often be omitted because the verb conjugation provides the context ("I eat meat" vs. "Como carne", where "como" means "I eat" all on its own), but you really do need the verb even in cases where you're using a being verb like "was", "am" etc.

I'm pretty sure the specific case is here because "nacer" is a grammatically active verb in Spanish and a passive verb in English: "Cuando Fulano nació..." is the common and natural way of rendering this. (Indeed, the only natural-sounding use for "era nacido" that comes to mind immediately would be populations: "1% de la población de San Martín era nacido en Rusia.")

Short version is that "to be born" is a passive verb in English that's active in Spanish, and the OP was pointing this out as a characteristic error, not an error about dropping a grammatical subject.
posted by migrantology at 4:03 AM on October 28, 2015 [1 favorite]

« Older What immediate action do you take after...   |   What's the real deal with rural (satellite?)... Newer »
This thread is closed to new comments.