Hooooow Doooo Iiiii Eeeelooongaaaate Vooooowels?
February 12, 2010 2:12 PM Subscribe
How possible is it to elongate vowels in audio processing?
Let us say I wanted a program that took a normal human speaking voice, and elongated only (or mostly) the vowel sounds, while leaving the consonants and silences about the same length. How much of a challenge would this be for someone experienced with audio processing programming? Easy peasy? Tough but doable? Impossible with the current state of the art?
Let us say I wanted a program that took a normal human speaking voice, and elongated only (or mostly) the vowel sounds, while leaving the consonants and silences about the same length. How much of a challenge would this be for someone experienced with audio processing programming? Easy peasy? Tough but doable? Impossible with the current state of the art?
Also, Record by Propellerheads may be worth a look as the recordings can be time stretched with little artifacting. (scroll down to time stretch)
posted by bigmusic at 2:23 PM on February 12, 2010
posted by bigmusic at 2:23 PM on February 12, 2010
If you know what the formants look like for various sounds and can reduce them to a set of unique features, you could probably classify them into vowel/non-vowel with SVMs, Gaussian Mixture Models or other classification approaches. I don't know anything about analyzing the human voice, though, so I have no idea what any of the complications would be.
posted by Blazecock Pileon at 2:35 PM on February 12, 2010
posted by Blazecock Pileon at 2:35 PM on February 12, 2010
Best answer: Are you looking to do a ton of it? (Dozens of vowels?) More than once in your life? Praat and its scriptability may be of use to you. There are dozens, if not hundreds, of Praat scripts freely available on the web. Some of these may be of particular use.
posted by knile at 2:55 PM on February 12, 2010
posted by knile at 2:55 PM on February 12, 2010
To do it manually wouldn't be difficult in the slightest (it just involves highlighting that chunk of the waveform and stretching it), but going syllable by syllable would take a hell of a long time.
posted by Sys Rq at 3:01 PM on February 12, 2010
posted by Sys Rq at 3:01 PM on February 12, 2010
Best answer: here's a praat script that extracts the voiced sections of a speech recording.
Voiced sections can be recognized relatively easily and reliably using autocorrelation, short time energy and zero crossing rate as criteria. I used a technique that combines all three for some work I did for my Master's thesis, contact me if you're interested in a description of how it works or some Matlab code that implements it.
Note that not all voiced sounds are vowels and it might be hard to only select vowels since some consonants are very vowel-like in their spectral properties.
posted by snownoid at 8:46 PM on February 12, 2010
Voiced sections can be recognized relatively easily and reliably using autocorrelation, short time energy and zero crossing rate as criteria. I used a technique that combines all three for some work I did for my Master's thesis, contact me if you're interested in a description of how it works or some Matlab code that implements it.
Note that not all voiced sounds are vowels and it might be hard to only select vowels since some consonants are very vowel-like in their spectral properties.
posted by snownoid at 8:46 PM on February 12, 2010
Metasynth has very usable spectral analysis tools which are focused on formants. This would be a formal treatment; you could also just slow down the parts of each word in any audio processing program of your choosing very easily.
posted by supernaturelle at 10:34 PM on February 12, 2010
posted by supernaturelle at 10:34 PM on February 12, 2010
« Older The call is coming from INSIDE YOUR BRAINSTEM! | Name that late '90s/early '00s urban fantasy novel... Newer »
This thread is closed to new comments.
posted by bigmusic at 2:20 PM on February 12, 2010