Could I give myself a Scottish accent through the magic of software?
March 4, 2013 6:44 PM   Subscribe

How possible would it be to write an app or piece of software that could take a piece of speech text and change the accent of the speaker? What challenges would there be given the current level of speech recognition and audio processing technology and linguistic knowledge?

Various threads and conversations on pronunciation and accents today have got me thinking about this question. Given the current state of the speech recognition technology, audio processing technology and linguistic knowledge of various accents, would it be possible to process an audio file containing (clear, non-noisy) speech and change the accent? (Think of it as Melodyne for changing speech)

I understand that speech recognition software such as Dragon have settings to detect various accents so there must be some linguistic knowledge applied in the detection, but how good is this technology, what are the current challenges in speech recognition and accent detection and what would it take to improve to the point where this software would be possible? What are the difficulties in actually processing the audio to change the speech with the new accent?

This is all idle curiosity and I'd be interested to hear about any specific research or applications that attempt the various steps that such software would require, or if any speech recognition/voice processing experts would care to throw in their two cents.
posted by TwoWordReview to Technology (4 answers total) 7 users marked this as a favorite
Well, if you use Festival, one of its default voices is Alan, based on the author's accent. But that would only make you sound sort of generically Edinburgh-ish, which is only one of hundreds of Scottish accents. I mean, who wouldn't want to speak Doric?
posted by scruss at 7:15 PM on March 4, 2013 [1 favorite]

I know very little about the software side of this. I can offer a bit of linguistic comment though. Vowels would surely be easier to deal with than consonants. Vowels are all more or less the same acoustically, except with different formants. It's relatively easy to move the formants around artificially. Here's a little tutorial on how people already use this technology to produce experimental stimuli; notice that the first part involves making a recording sound more like it comes from a particular dialect area. You'd probably also need to massage the vowel durations in slightly-complicated ways to make it sound really natural. Still, this is a task that I can imagine would be within grasp, and could be used to convert between accents that are mainly differentiated by their vowels -- Toronto to Chicago, for example.

Some of the consonants, though, would be a real mess. Honestly, the only way I'd know how to change an American approximant /r/ to a Scottish trilled /r/ would be to cut out the American one and paste a Scottish one (like, a recording of an /r/ produced by a real Scottish English speaker) in its place. And that isn't really the fault of the technology. These two sounds just have so little in common, articulatorily or acoustically, that I can't even begin to imagine the process of changing one into the other.
posted by ootandaboot at 7:40 PM on March 4, 2013 [4 favorites]

You can do voice recognition and then do text to speech in a different accent, but it wouldn't sound like you.
posted by empath at 8:51 PM on March 4, 2013

I'd also consider sort of quasi-translating to (or somehow incorporating spellings from) Scots as well. To my mind, Scots is more well-defined than Scottish English, so it should be a good compass to orient yourself towards the Scottish accent, so to say.

(Completely layman POV; all anecdata from conversations in Scotland over one Hogmanay two years back)
posted by the cydonian at 3:03 AM on March 5, 2013

« Older Subject matter for a job interview micro training...   |   So much regret. Newer »
This thread is closed to new comments.