Removing content from speech
November 20, 2008 12:45 AM   Subscribe

I'm interested in removing the verbal content from a series of audio clips while keeping the pitch, intonation,and rhythm. I have no idea how to do this.

I've read some papers (see, this paper for example) where researchers "content filter" audio clips so that you cannot understand the content of what somebody is saying while retaining the other characteristics of a person's speech. It sounds like this is accomplished by cutting out the higher frequencies but I have no idea how one goes about doing this/what (preferably open source) tools are best.

Thanks!
posted by eisenkr to Technology (3 answers total) 1 user marked this as a favorite
 
Did you ever bother following up the references?

"... passing only frequencies below the range of 410~450 cycles per second, with an attenuation of 60 decibels per octave. This procedure resulted in a content-filtered recording".

Audacity should do this for you easily.
posted by Pinback at 1:38 AM on November 20, 2008


Best answer: Basically, low-pass filter somewhere above the fundamental frequency (called f0, basically represents the pitch of the sound). Audacity can do this, but the program praat is more tailored to this kind of operation. You can use the pitch contour functions to extract the f0 and see what the f0/pitchrange is for the particular vocal sample you are using, and then use the filtering functionality to cut frequencies starting somewhere mid-high in the f0 range.
posted by advil at 6:35 AM on November 20, 2008


Best answer: If this is stereo content, and the vocal content is recorded in the dead center of the stereo field, while other content is panned to the left and right, you can invert one channel and sum them. This will remove all sound that is the same in both channels, leaving you with... something.

I've done this in the past to make rough karaoke versions of popular songs.

If you look at the waveform of this audio on a frequency plot rather than a time plot, you can find the necessary low and high pass frequencies to make a notch filter (band reject filter) or you can use a low and high pass filter. Voice tends to be between about 400 and 3k.

As Pinback suggests, Audacity can do this. I've never used praat, but any decent DAE will have the necessary filter plugins to do this in a rough way.
posted by tomierna at 8:26 AM on November 20, 2008


« Older How can I afford to build an "emergency fund" ?   |   Seeking Studs, Avoiding Trouble Newer »
This thread is closed to new comments.