Music vs. speech detection software?
February 14, 2009 7:52 AM Subscribe
Music vs. speech detection software?
Say you have a recording of a radio show that includes some speech and some music, but you only want to hear the speech. Is there any software that can automatically detect music, and remove it?
Say you have a recording of a radio show that includes some speech and some music, but you only want to hear the speech. Is there any software that can automatically detect music, and remove it?
I've been hanging back hoping that someone actually knowledgeable would chime in, too, but here's what I've got. FWIW, I used to do audio lab research, though what I know isn't enough in this domain for me to feel totally comfortable.
It's not clear what you're asking, for starters. There are two scenarios you could be asking about:
A: You have something like a Top 40 radio broadcast, which goes [Casey Kasem speaks] [play My Sharona] [Casey Kasem speaks], and you want to be able to split an mp3 file to isolate the snippets of Casey talking.
B: As hungrysquirrels above, you have someone talking over a background music track, and you want to do "reverse karaoke" and eliminate the background noise.
There's no processing I know of that's good enough to analyze A and tell speaking apart from music. I can imagine a bunch of tricks (sic Dragon Dictate on it and check for what it can't transcribe; sic PureData on it and see when it can and can't track pitch, etc) but I can't imagine any of them working well.
The only thing you may have going for you is the tendency for producers to pan voices into the dead center, so they're the only signal that's identical on L and R tracks. Usually you can make karaoke by subtracting L from R (assuming the vocals are center-panned) and I suppose you could try a variant of that trick here (eg, subtract L from R, then subtract the resulting signal from L or R, and see what you get). The trouble is that if there are vocals in the music you want to eliminate, you're likely going to save those too.
posted by range at 3:10 PM on February 14, 2009
It's not clear what you're asking, for starters. There are two scenarios you could be asking about:
A: You have something like a Top 40 radio broadcast, which goes [Casey Kasem speaks] [play My Sharona] [Casey Kasem speaks], and you want to be able to split an mp3 file to isolate the snippets of Casey talking.
B: As hungrysquirrels above, you have someone talking over a background music track, and you want to do "reverse karaoke" and eliminate the background noise.
There's no processing I know of that's good enough to analyze A and tell speaking apart from music. I can imagine a bunch of tricks (sic Dragon Dictate on it and check for what it can't transcribe; sic PureData on it and see when it can and can't track pitch, etc) but I can't imagine any of them working well.
The only thing you may have going for you is the tendency for producers to pan voices into the dead center, so they're the only signal that's identical on L and R tracks. Usually you can make karaoke by subtracting L from R (assuming the vocals are center-panned) and I suppose you could try a variant of that trick here (eg, subtract L from R, then subtract the resulting signal from L or R, and see what you get). The trouble is that if there are vocals in the music you want to eliminate, you're likely going to save those too.
posted by range at 3:10 PM on February 14, 2009
Yea, I've been doing a bunch of noise reduction for post production on films, and with what I've got while it's easy to remove noise, anything uneven starts to get really tricky. As stated above, trying to reverse karaoke it, or using plugins that are designed to do it are the best bets.
Otherwise, I suspect it would be hours of sitting running subtle passes of noise reduction, and tweaking EQ to pick out as much of the voice as you can get, but that way I reckon you'd be lucky to get something sounding pure. It's more fun than you'd expect, but then maybe that's just me...
posted by opsin at 3:37 PM on February 14, 2009
Otherwise, I suspect it would be hours of sitting running subtle passes of noise reduction, and tweaking EQ to pick out as much of the voice as you can get, but that way I reckon you'd be lucky to get something sounding pure. It's more fun than you'd expect, but then maybe that's just me...
posted by opsin at 3:37 PM on February 14, 2009
Response by poster: Sorry for being unclear, I was actually after example A above.
You have a recording which consists of music, and speech, alternating, but never at the same time, and you want to keep only the speech.
posted by Mwongozi at 1:05 PM on February 18, 2009
You have a recording which consists of music, and speech, alternating, but never at the same time, and you want to keep only the speech.
posted by Mwongozi at 1:05 PM on February 18, 2009
« Older Where to find MP3 file for Seabiscuit & War... | Why do some corridors/rooms/new books smell... Newer »
This thread is closed to new comments.
posted by hungrysquirrels at 2:07 PM on February 14, 2009