Practicing Mandarin pronunciation with speech-to-text
March 6, 2021 1:47 AM   Subscribe

I'm learning Mandarin, and I'm trying to practice pronunciation, so that I don't need to think hard in order to read pinyin and say the correct sounds. I've been practicing a little using speech-to-text software. Is this a reasonable idea? Are there downsides I'm not aware of? What's the best speech-to-text software for this?

Clearly this shouldn't be the only way I practice (I'm planning to spend time with a tutor who can help as well), but I'm wondering if it's a reasonable part of practicing, at least at first. I've tried it a bit, and it seems quite effective — I can take a word that I'm unable to pronounce in a way that the speech-to-text understands it, practice for a little while, and at the end, pronounce it in a way that the speech-to-text understands it correctly. I'm doing this with the Pleco voice search, which uses the Google/Android "cmn (Traditional Han, Taiwan)" speech-to-text system. Since it's inputting into Pleco, it shows me the pinyin, which means I don't have to worry about homophones. I like that this lets me practice pronunciation with nearly-instantaneous feedback for a little bit whenever I feel like it, rather than only in large prescheduled blocks with a native speaker, but I'm worried there may be a downside that I don't understand (since I've never seen this technique recommended anywhere). Specific things I'm worried about:
  • Will this give me a weird accent? I don't care much about getting the "standard" newscaster accent, but I do want to be easily understandable, so people don't have to think hard to understand me.
  • Are there ways in which speech-to-text is idiosyncratic such that most humans would think I'm saying one thing, but the speech-to-text would think another?
  • Is there anything else I don't understand about this? Why isn't this more generally recommended to people without easy access to native speakers?
Are any of these a problem? Also, what is the best speech-to-text system for this? I've used the Google one, and a friend recommended the WeChat one, are there any others I should check out?

I'm particularly interested in hearing from fluent/native speakers who have used speech-to-text software at least a little bit.
posted by wesleyac to Writing & Language (6 answers total) 1 user marked this as a favorite
 
My first thought: that sounds pretty clever. From your description, it sounds like you're refining your pronunciation.

Now, I learned Mandarin from tapes, plus some conversations, and there was a jarring transition from single words, to entire sentences. So I hope you're not just doing words.

Does the input system check tones? I hope so, because getting the tones wrong will lead to having a bad accent, among other problems.

Can you have a session with the tutor to see how you're doing, and make sure you're not reinforcing some bad habits?
posted by zompist at 2:43 AM on March 6, 2021


Meh. If you're a very basic beginner, it might be somewhat help to practice some unusual (to a native English speaker) sounds, like the difference between lù and lǜ or shi and xi. It might be helpful for practicing tones as well, but I have a hunch it could reinforce bad pronunciation.

My main concern would be that you would focus so much on the pronunciation of each individual character, bobbing your head around as you over-enunciate your tones, that you wouldn't be practicing prosody i.e. the natural rhythms of sentences in normal everyday speech.

If you don't have much time with a tutor, I would focus more on your passive listening, reading, and writing characters when you're on your own.
posted by alidarbac at 5:57 AM on March 6, 2021 [2 favorites]


I think speech-to-text can be a bit too forgiving in terms of pronunciation - so that it can capture different accents, for example. It’s also targeted at understanding what you said, rather than correcting pronunciation. I would also be concerned that it can reinforce mispronunciations. (I mean, I just tried Pleco and it recognized almost everything I said to it, and I know my Mandarin pronunciation is very approximate at best.)
posted by scorbet at 6:30 AM on March 6, 2021


Best answer: Does the input system check tones? I hope so, because getting the tones wrong will lead to having a bad accent, among other problems.

I think speech-to-text can be a bit too forgiving in terms of pronunciation - so that it can capture different accents, for example.


I'm a native speaker of Mandarin who also has the Pleco app (but is using it on an iPhone, so YMMV) and I'd also be pretty worried about this.

I just now tried saying things like zhōngguō, dázhěn, or fǎngkài which are deliberate mispronunciations, tone-wise, of the common Mandarin words zhōngguó, dǎzhēn, fàngkāi (中國 'China', 打針 'to inject', 放開 'to put aside') and Pleco gave me the common words above -- e.g., even though I deliberately said fǎngkài (3rd-4th tone) it produced fàngkāi (4th-1st tone).

The dictation is also very forgiving in not differentiating z/zh, c/ch, s/sh and n/ng -- e.g. if I deliberately say dǎzhēng or cīfàn, it still yields dǎzhēn 打針 or chīfàn 吃飯. I'm less "concerned" about this because there are native-speaker varieties of Mandarin that don't always systematically differentiate these in informal speech (such as the one I speak, that of Taiwan), but I'm not sure these are features that I'd counsel a Mandarin learner to adapt, particularly if you are aiming for any variety from the north of China (i.e. the "standard" prestigious variety of Mandarin in China).

This forgiving nature makes sense for a native or fluent speaker (you might have misheard the word initially) but I think it's going to be counterproductive for a learner, in particular with the tones -- you don't want to get into/fix bad tone habits.
posted by andrewesque at 7:18 AM on March 6, 2021 [5 favorites]


I agree with everyone else: not going to help much. speech to text is very tolerant of mistakes so unless your pronunciation is so far off that when you say Coca Cola you got "bite the tax tadpole" (apocryphal advertising myth) I'd say it's not helping much.
posted by kschang at 9:59 AM on March 6, 2021


What everyone else said, plus that as with any body-skill you're learning, practice makes permanent, and perfect practice makes perfect (permanent). Most of the people in my college 500-level chinese class (that I struggled through as a 'legacy learner') who had really bad pronunciation were the ones unlucky enough to have a (watered-down, inadequate feedback) Chinese class in high school, whereas the people who started in college generally did not have problems.

You're much better off practicing segmenting sentences into words, which you can also use machines for assistance.
posted by batter_my_heart at 3:55 PM on March 6, 2021


« Older Lady chin stubble   |   Easy remote physician/assistant to complete Ohio... Newer »
This thread is closed to new comments.