Recovering background conversation from a live music recording?
March 21, 2006 9:02 AM   Subscribe

Help me recover background conversation from this stereo live recording.

I have a live recording—here (mp3, ~5M)—that has conversation in the background, mostly on the left side of the stereo field.

The recording is from a DV cam with builtin stereo mic heads. Sound quality is kind of sketchy to begin with.

So how can I best isolate the conversation? I don't need to erase the music so much as make the conversation clearer and more distinct.

I have access to decent consumer sound tools (using Audition as my base of operations these days). I can, in theory, do various sorts of filtering. My question is techniques. How do I do this?
posted by cortex to Technology (18 answers total)
 
This (PDF) chapter on noise reduction from the Adobe Audition Classroom in a book might help.
posted by jeremias at 10:00 AM on March 21, 2006


This might just be an aside, but I was playing a show with a band of mine in Halifax, and was talking between songs. People were, generally, listening, but I stopped for a second and heard this girl in the back of the bar go, "And then it bursted out and went BANG! SPLAT! All over the wall." I wasn't able to speak again for a good fifteen seconds, and I think I might have been the only one to hear it.

As luck might have it, we recorded that show from the soundboard, so her conversation was there very faintly. We just ran it through some noise-filtering algorithms in some sound edition software and boosted the suspected EQ band until we could hear it.

I won the bet: I wasn't crazy.
posted by jon_kill at 10:04 AM on March 21, 2006 [1 favorite]


Response by poster: Heh. That is, if an aside, one very much in tune with the spirit of this request. Most of the barely-audible conversation is about the question of the cops who have appeared at the house where we were playing.
posted by cortex at 10:09 AM on March 21, 2006


No specifics but just general ideas.
Could you not take the right channel invert it and then diff/subtract from the left channel. The difference would be the talking ?
Then run the filters to clean up.
Sounds reasonable
posted by stuartmm at 10:17 AM on March 21, 2006


Totally unreasonable, sadly stuartmm, as the left and right soundfields on a stereo file don't contain the same information to cancel each other out. The technique you suggest is often proferred as a way of getting an "a capella" version of song for remixing, but won't work for that either for the same reason...
posted by benzo8 at 11:00 AM on March 21, 2006


Response by poster: Exaclty, benzo8. The idea is good, stuartmm, but the bleed between the two mic sources is very complicated.

Partly, I'm wondering if there's any good advanced surgery I could do—a combination of several process (some sort of left vs. right cancellation might be possible; band filtering; etc?), and insight into the details of some of those processes.
posted by cortex at 11:04 AM on March 21, 2006


But... stereo mic heads, as in seperated from each other by how far? It might work, what stuartmm has suggested. It might not work great, but it might get you a little something that'll be a good place to start from.

I can't imagine the directionality on DV cam microphones is that stunning. If they're mounted within an inch or two of eachother (or stacked?) then you might get a little something there.
posted by jon_kill at 11:16 AM on March 21, 2006


Response by poster: Sony HC40 DV camcorder

Those are the mics, behind the grill mid-center just above the IR port. Not much separation, unfortunately. But, yes, the stereo seperation might be a start.

(Imagine the setup thus: camera pointed at the band from, say, twenty feet away; amps behind the band, a bit to the right of the center of the camera view; spectators standing around and talking are between 3 and 10 feet from the camera, to the left and behind, with scattered other conversation and noise coming from the left of the camera at 20-40 feet.)
posted by cortex at 11:41 AM on March 21, 2006


What i would try is first discard the channel with the least talking on it, then try running noise reduction a few times on the remaining channel by using prominent sounds as your noise sample, for instance if theres a good solid guitar tone, use that as your floor and try to scrub that sound out. The more distinct the sounds you remove from vocal noises perhaps the better your results. For some reason I get the best noise reduction results from my old-ass copy of Cool Edit Pro, but if your software has a good preview and adjustable percent removed you mayy get decent results.

The results won't be pretty no matter what, if that's what you need you're really gonna be out of luck, but you may be able to get legible speech out of it if that's worthwhile.
posted by 31d1 at 11:46 AM on March 21, 2006


Response by poster: 31d1—legible-but-not-pretty is exactly what I'm hoping for.
posted by cortex at 1:50 PM on March 21, 2006


Diamond Cut sells a "Forensics" program that claims to do this, but it costs $1,399.

I've done some speech transcription off noisy tapes, and I think there's no better instrument than the human ear and brain, which have evolved to pull speech out of background noise. Get a secretary's dictation transcriber player, which allows you to back up and play short sections several times, and use headphones.

If you have a chance to set up the microphones, use a binaural system (a Google search turns up many examples). This lets you pick out individual voices effortlessly.
posted by KRS at 2:08 PM on March 21, 2006


I'd just add that if you aren't already working with the original source audio track, rather than an MP3, you should be, if at all possible.
posted by Good Brain at 3:44 PM on March 21, 2006


Response by poster: Good Brain—no worries, I have the uncompressed source audio.
posted by cortex at 3:50 PM on March 21, 2006


Where in the file is the conversation you are wanting to isolate? (As in, about how many minutes/seconds in?)
posted by tomierna at 3:53 PM on March 21, 2006


Response by poster: It's scattered throughout. Small snatches of discussion are clearly audible, but there's a lower layer of conversation that I'd like to bring out of the mix a bit.

Here's a possible application: isolating the non-musical parts of the track so that I can mix them up (or down) and throw a sort of greater atmospheric sparkle onto a cd or soundtrack—it's not that I want to rip the conversation out, so much as I'd like to control it enough that I can make it stand out better or more evenly against the original recording, for example.
posted by cortex at 3:57 PM on March 21, 2006


The problem with the stereo-field cancelling technique is that while there will be some areas that are close enough to cancel out, the ones that aren't will *add* noise, which will, um, cancel out any benefits... Unless the two channels are identical, deifferential addition of sound sources just makes a noisy mess...
posted by benzo8 at 2:16 AM on March 22, 2006


Response by poster: That's one of the areas where I was wondering about Advanced TechniquesTM. Whether there's any hope in riding the stereo envelope, for example, or doing some fancy convoluted math to make some progress.

Anyway, I may give it a shot today; I'll post the results if I get anywhere.
posted by cortex at 5:48 AM on March 22, 2006


There is, in principal, a way to optimaly seperate the channels. I doubt it will be of much use in practice though.

Assume you want to isolate the signal which is unique to the left channel (I'll call this Ls)..

Your signals are:

L = Ls + Cs
L - the left channel data
Ls - signal which is unique to the left channel
Cs- signal which is common to both channels
R = Rs + Cs
R - the right channel data
Rs - signal which is unique to the right channel
Under certain conditions you might assume Rs = 0, and then it is easy to determine Ls (and Cs).

You could also use the cross-spectral density to formulate the best filter for selecting/removing the Cs part, which could then yield Ls_est, Rs_est and C_est (where _est indicates estimate).

L - Cs_est would combine the two approaches. I'm not sure what the mathmatical basis for that would be, but whatever works..
posted by Chuckles at 1:43 PM on March 23, 2006


« Older ToothEvolutionFilter: Why are human teeth so...   |   Name That Tune Filter Newer »
This thread is closed to new comments.