Hooooow Doooo Iiiii Eeeelooongaaaate Vooooowels?
February 12, 2010 2:12 PM   Subscribe

How possible is it to elongate vowels in audio processing?

Let us say I wanted a program that took a normal human speaking voice, and elongated only (or mostly) the vowel sounds, while leaving the consonants and silences about the same length. How much of a challenge would this be for someone experienced with audio processing programming? Easy peasy? Tough but doable? Impossible with the current state of the art?
posted by lore to Computers & Internet (7 answers total) 4 users marked this as a favorite
 
I would say it's possible. But not batch programmable for an end user. If you wanted to do this with some off the shelf program, I'd suggest mucking around with melodyne, and just slowing the BPM of the recording down and see how that works.
posted by bigmusic at 2:20 PM on February 12, 2010


Also, Record by Propellerheads may be worth a look as the recordings can be time stretched with little artifacting. (scroll down to time stretch)
posted by bigmusic at 2:23 PM on February 12, 2010


If you know what the formants look like for various sounds and can reduce them to a set of unique features, you could probably classify them into vowel/non-vowel with SVMs, Gaussian Mixture Models or other classification approaches. I don't know anything about analyzing the human voice, though, so I have no idea what any of the complications would be.
posted by Blazecock Pileon at 2:35 PM on February 12, 2010


Best answer: Are you looking to do a ton of it? (Dozens of vowels?) More than once in your life? Praat and its scriptability may be of use to you. There are dozens, if not hundreds, of Praat scripts freely available on the web. Some of these may be of particular use.
posted by knile at 2:55 PM on February 12, 2010


To do it manually wouldn't be difficult in the slightest (it just involves highlighting that chunk of the waveform and stretching it), but going syllable by syllable would take a hell of a long time.
posted by Sys Rq at 3:01 PM on February 12, 2010


Best answer: here's a praat script that extracts the voiced sections of a speech recording.
Voiced sections can be recognized relatively easily and reliably using autocorrelation, short time energy and zero crossing rate as criteria. I used a technique that combines all three for some work I did for my Master's thesis, contact me if you're interested in a description of how it works or some Matlab code that implements it.

Note that not all voiced sounds are vowels and it might be hard to only select vowels since some consonants are very vowel-like in their spectral properties.
posted by snownoid at 8:46 PM on February 12, 2010


Metasynth has very usable spectral analysis tools which are focused on formants. This would be a formal treatment; you could also just slow down the parts of each word in any audio processing program of your choosing very easily.
posted by supernaturelle at 10:34 PM on February 12, 2010


« Older The call is coming from INSIDE YOUR BRAINSTEM!   |   Name that late '90s/early '00s urban fantasy novel... Newer »
This thread is closed to new comments.