How do I go about creating some speech synthesized audio files of a handful of made-up words?
February 5, 2010 10:34 AM   Subscribe

I need to create about 8 gender-neutral, speech-synthesized audio files of variants of a made up word for a research project I'm working on. What program should I use to do this? Is there a really simple way to complete this task? What's a good resource or tutorial that I can walk myself through?

The audio samples need to be as simple and similar to each other as possible (just tweaked versions of basically the same thing, where I've swapped out one consonant or cardinal vowel sound for another). I don't have a lot of time to spend on this, but need to do what's necessary to create the most scientifically sound (ha!), unmarked (neutral/canonical) forms.

I am well versed in phonetics, phonology, morphology, speech acoustics, etc., but I haven't had the opportunity to create digital sound files from scratch before now.

Bonus if this can be done in Praat or Audacity (neither of which I'm terribly familiar with, but both of which have academic chops that will make explaining my methodology much easier).
posted by iamkimiam to Technology (11 answers total) 2 users marked this as a favorite
 
Modern Macs (OSX) have built-in text to speech. It's not great but in this case, that may be an advantage as you can manipulate the text to get odd audio out of it. If you can get one, I think the "Alex" voice is the highest quality one but I don't recall. You can capture the audio via another computer or try to hijack the audio itself. I think there's a program call Audio Hijack Pro that will do this.
posted by chairface at 11:01 AM on February 5, 2010


Once you work out the phonetics, you could get one of Festival's many voices to render it to a file.
posted by scruss at 11:02 AM on February 5, 2010


Gender neutrality can probably be accomplished with adjustments to pitch. Audacity's built in methods tend to be pretty ham-handed though. So your results may be rough.

empo / Pitch / Speed

Pitch is related to frequency. Pitch is the high or low tone we hear in any piece. By decreasing the time between crests, one can increase the frequency and, therefore, the pitch. Doing so increases the speed since the time between crests is reduced. However, in Audacity, changing the pitch will not affect the speed of the piece since Audacity readjusts each track to maintain the original speed; similarly, one can increase the pace of the song in Audacity without noticeably changing the pitch using the Change Tempoeffect. However, if you do wish to change the speed and pitch, perhaps to simulate a record winding down, you can use Audacity's Change Speed effect, which will affect both tempo and pitch. Both Speed and Tempo changes will affect the overall length of your piece.

To apply any of these effects, select the track or portion of the track you want the filter applied to, then go to Effect > Change Pitch, Change Speed, or Change Tempo. From there, you will be given many advanced options, but the most important one is the percent change field. Inputting a percentage greater than zero will speed up or pitch up your track while a negative number will do the opposite.


Where I pulled that paragraph from
posted by edbles at 11:03 AM on February 5, 2010


2nd festival, though I've only ever played with it under Linux where it's generally easy to install via packages and whatnot. Not sure of it's availability on other platforms.
posted by jquinby at 11:13 AM on February 5, 2010


To follow up on chairface's suggestion, the say command in the terminal can output to an AIFF file.
posted by scruss at 11:17 AM on February 5, 2010


Seconding festival - it is highly scriptable, meaning you can dynamically access and manipulate the phonetic model under the text interpretation. With any other solution I know of you would have to rely on bugs in the TTS engine or edit the source code, but with festival you can actually specify the rules for generating phonemes from text.

If are comfortable with scripting, you may want to do this in festival's embedded scheme interpreter (scheme is an excellent language), otherwise you could make a sable xml file specifying the utterances (the markup should be vaguely familiar if you have ever edited an html document).
posted by idiopath at 11:52 AM on February 5, 2010


For clarity's sake: with the scheme interpreter built into festival, you can generate the sorts of rules that derive pronunciations from text. From the sable markup, you can specify low level details of pronunciation parameters on a word by word basis, in one of the existing sets of rules (English, Spanish, French, etc.).

By using the ability to specify pronunciations / amplitudes / speeds / pitches explicitly, and switch languages in a single document, you should be able to get a decent result even without going into scheme scripting though.
posted by idiopath at 11:58 AM on February 5, 2010


Also, I just found this: someone used praat in order to generate pronunciation data for festival (for the invented language logjban). And from that page I found this fascinating rundown on making pronunciation rules for festival.
posted by idiopath at 12:06 PM on February 5, 2010


Id be very curious about making a "gender neutral" voice. I guess you can play with the pitch all you want, but someone is going to hear gender.
posted by damn dirty ape at 12:24 PM on February 5, 2010


Response by poster: Thanks for all the suggestions so far! I was worried that a bizarre question like this wouldn't generate much response, but I really should never doubt the knowledge and resourcefulness of MeFites!

As far as 'gender-neutral', I probably should have said something more like 'relatively gender ambiguous'. People will always perceive gender, but I just need something that is not overtly gender biased. I'll play with the pitch and whatnot. Definitely excited about tinkering around with Festival. Thanks again!
posted by iamkimiam at 12:33 PM on February 5, 2010


There's an AT&T speech synthesis demo that lets you try out different voices.
posted by Pronoiac at 1:34 AM on February 6, 2010


« Older Cool Stuff to Learn   |   Jack and the Beanstalk via the Yellow Submarine Newer »
This thread is closed to new comments.