Join 3,415 readers in helping fund MetaFilter (Hide)


Automating Simple Audio Editing/Processing
January 25, 2014 10:00 AM   Subscribe

I'm looking for (free) tools to help with batch creating some text-to-speech audio clips and compiling those with other clips.

Specifically, I'd like to (a) generate a brief text-to-speech phrase (perhaps by using a site like this); (b) add that to another short clip; (c) repeat that text-to-speech phrase; (d) add two seconds of silence; (e) make those (a, b, c, and d) one track; and (f) repeat this process 150-200 times with unique clips.

I can handle basic copy-and-paste audio editing with, say, Audacity, but I have no experience with any sort of batch processing. Could you please point me to any simple established software that might save me some time? Thanks.
posted by glibhamdreck to Technology (3 answers total) 2 users marked this as a favorite
 
I've written scripts to do this for a one-off game which used an interactive voice response system (phone menus) as a component. I used either Cepstral or flite for the text-to-speech part (although if you have a Mac, the built-in command "say" would also work) and SoX for pretty much everything else. The scripting was done in either bash or perl.

These are all command-line utilities, though, and if you have no experience with those, it'd be a bit of a steep learning curve. It's simple, for values of "simple" that assume a fair amount of background knowledge.
posted by hades at 12:01 PM on January 25


festival has text2wave as a helper script

text2wave myfile.txt -o myfile.wav

the way you build the txt file is up to you and then how you cat the files together including files of required silence
i have used it to create a sudoku program for visually impaired people.
posted by stuartmm at 2:27 PM on January 25


The tricky bit to this is getting all the sample rates to match. flite, as hades said, is a good (basic) TTS, but it outputs mono 8 kHz WAV files. It's unlikely that your other programs have that low a sample rate. So pick a file format, and stick with it.

If you wanted to use CD quality (44.1 kHz, 16-bit, stereo), you could do:
  1. generate TTS phrase in tts.wav:
    flite 'flite is a small simple speech synthesizer' flite.wav ; sox flite.wav -e signed-integer -b 16 -c2 -r44100 tts.wav && rm flite.wav
  2. generate two seconds of silence in silence.wav:
    sox -n -e signed-integer -b 16 -c2 -r44100 silence.wav trim 0 2
  3. join it all togerther, assuming that other.wav exists, and you want the output in allonetrack.wav:
    sox tts.wav other.wav tts.wav silence.wav allonetrack.wav
Festival will likely have better sound quality, and may not need the format fiddling.
posted by scruss at 2:44 PM on January 25


« Older The 5x5 Home Work podcast abou...   |  After reading stories like thi... Newer »

You are not logged in, either login or create an account to post comments