Voice of Google Maps
May 17, 2020 8:55 AM   Subscribe

We use Google Maps for a GPS when driving. We're in Germany, so this is in German. My 9 year old wants to know whose voice we are hearing. Has an actual person recorded the syllables? (Who?) How do they make it sound fairly natural and not robot-like? She says "I want to know ALL about how they make the voice happen!" I promised I'd ask you!
posted by Omnomnom to Technology (8 answers total) 2 users marked this as a favorite
 
There's a lot of ways to synthesize (make digital) speech. Here's a great article on wikipedia for the different ways it's possible!

https://en.wikipedia.org/wiki/Speech_synthesis

and in German: https://de.wikipedia.org/wiki/Sprachsynthese

Here's the voice of Siri giving an interview. I always thought it was interesting her perspective in hearing her voice other places.

https://www.youtube.com/watch?v=sGAYFsl7OEQ

In articles, it says that the system used her voice for 100s of phrases, but has since been trained on her voice enough to synthesize new phrases and words.

I'm sure after reading the article, it will help imagine the ways that the voice could be made!
posted by bbqturtle at 9:47 AM on May 17, 2020 [1 favorite]


There are completely synthetic voices but they sound very robotic. So it's always a real person, recording hours and hours of sentences with practically every phoneme combination, multiple times. It turns out (as anyone who has tried to learn a foreign language can tell you) that the transitions between various sounds can be quite tricky. There were older synthetic voices, like for corporate phone systems (IVR, Interactive Voice Response) that were created by simply stringing together individual phonemes. They sounded uncanny, since each sound was real but the combination was not.
These days you train a machine learning / artificial intelligence algorithm on all the recordings, and it produces a computational "model" that simulates the sounds it was trained on.
posted by wnissen at 9:57 AM on May 17, 2020 [1 favorite]


Here's a great documentary about the history of speech synthesis, including fantastic samples from the early Voder instrument of the 1930s (basically, what if you could play a voice like a piano?).

I don't know how exactly the Google Maps voice was produced, but it's a safe bet that in a few years, it'll be created via deep learning. If your kid wants to know ALL about it (awesome!), you can jump forward a couple of decades and go straight to the source:

This DeepMind blog post gives a great informal technical overview of the state-of-the-art in speech synthesis.

Here's a blog post from Google AI on a more recent controllable model.

Even if the text is confusing, you can get a good sense of the model by just listening to the embedded audio in each post.
posted by Rich Text at 10:42 AM on May 17, 2020 [1 favorite]


There are completely synthetic voices but they sound very robotic. So it's always a real person

Not anymore!
posted by misterbrandt at 11:47 AM on May 17, 2020 [2 favorites]


The Android TTS system (Pico) was written by SVOX GmbH, now part of the Nuance IP-borg. The docs don't give details of how the voices were prepared.
posted by scruss at 1:34 PM on May 17, 2020


Does your daughter understand both German and English? If so then you might talk about the problem of getting an English language version of a satnav voice to deal with the problem of how to pronounce place names in German. I think of my Australian accented Apple Maps voice who pronounces "strada regionale" in Italy as "strada reg nail" - one of a number of confidently spoken pronunciation disasters.

This will probably change - but it is quite a tricky problem. If I am an English speaking monoglot driving in Germany then a native English voice would sound strange when having to pronounce local names. Ideally I would like to have the English voice of a German native speaker who could tell me "turn right onto Karl-Liebknecht-Straße" in a credible manner - basically what I would have if I had a real life local guide in the car with me. There is quite an acute need for voices that can do this in places like Europe where there are many languages in a small area - but so far as I am aware nobody has solved the problem yet!
posted by rongorongo at 3:29 PM on May 17, 2020


https://en.wikipedia.org/wiki/Susan_Bennett
posted by Wild_Eep at 9:29 PM on May 17, 2020 [1 favorite]


I met a woman at a party who is a voice actress and voiced OnStar. She was happy to let me record her saying, "fuck you, you fucking fuck," in a perfect OnStar voice.
posted by bendy at 5:33 PM on May 18, 2020 [1 favorite]


« Older Please help me like lentils   |   Struggling to decide what to do for childcare for... Newer »
This thread is closed to new comments.