Python or simliar to read wavs for tone onsets?
November 23, 2015 10:55 AM   Subscribe

Is there a quick way, using Python or Ruby or whatever, to read a wav file for several tones and import the time of the onset of different tones?

There would only be five or so different tones in all that are different in terms of frequency. The wav would JUST have tones, but there would be anywhere from 4 to 80 of each of the ~five different tones, and we would like to know the timing of each instance of each kind tone. Output would be something like a txt file with rows for each tone and the onset timing for each in milliseconds. Pointing me to the language and module or a general method would be helpful. Thanks!
posted by ancient star to Computers & Internet (11 answers total) 1 user marked this as a favorite
I did something very similar a couple weeks ago. Don't have source code handy unfortunately.

I used pyaudio to record a wav file, then read the wav file with scipy's audio file reader, then did an fft in scipy. I was just doing straight 10 second windows of audio data, but it sounds like you need to be more precise, so maybe a 1 second rolling window with 0.01 second offsets would work. Obviously it would be a little slow to process but an fft doesn't take long on modern hardware.

if you're frequencies aren't EXACT, you have to get deeper into signal processing to identify close matches, and you'll have some false positives. I don't know anything about that.
posted by miyabo at 11:17 AM on November 23, 2015

You can use an FFT (Fast Fourier Transform) to measure the intensity of different frequencies in a sound clip. I would use Python and Scipy for this. Here's a decent introduction to this exact task, but to summarize:

- Read the WAV file with Scipy
- Slice it into chunks (25ms, 10ms, 5ms, 1ms - based on the resolution you need for your output)
- Perform an FFT on each chunk to see what frequencies are in it
- Iterate over the chunks and pick out the first one with each frequency you're after
posted by pocams at 11:19 AM on November 23, 2015 [2 favorites]

I would do this in Pd using the fiddle~ object.
posted by univac at 11:32 AM on November 23, 2015

an fft tells you about the frequencies over the entire chunk of time. that's why pocams is saying to slice it into chunks. but then you get complicated because smaller chunks mean lower signal to noise and you start worrying where in a larger chunk it starts.

so it might be simpler to filter for the particular tones. a simple filter is to multiply by the signal you are expecting. in other words, generate a sinewave signal (sampled at the same frequency as the data) of, say, 10 cycles, and then convolve that with the signal. the result will give you something that is typically near zero except when the tone is present.

or you could construct bandpass filters for each frequency and use those, which would likely give better results. the theory isn't hard, but it's some amount of work unless you can find a library.

these are just rough ideas (sorry). i don't have methods or exact instructions, and without playing around myself i have no idea how well they will work with your data. but here is an example of someone filtering out a 600Hz signal.
posted by andrewcooke at 11:32 AM on November 23, 2015

If you have access to a Matlab license and the related modules (maybe through work or school), they have modules with this functionality built in. Here is an example (the plot midway down the page) where a Fourier transform is conducted and the output is given as a 3D plot of Time/Frequency/Amplitude in x,y,z respectively.

If you are real slick with python, you can do what pocams suggests and discretize the time domain into chunks and create frequency/amplitude data for each slice. Then compile them all together into a 3D array as a function of the time that the the slice is centered on, thus recreating what the Matlab functions are doing more or less. Sounds like a fun project!

*the point about aliasing frequency content as you make smaller and smaller slices is correct. You must keep them slices relatively as large as possible to preserve the frequency you are interested in and then overlap as necessary. As a rule you need at least 3 data points to define a frequency, so check the sampling rate of the data.
posted by incolorinred at 11:32 AM on November 23, 2015

Response by poster: A couple of additions: we know exactly what tones we're looking for. We're doing this as a way to get the precise timing of audio in a video recorded experiment. We're adding the tones to the right channel of the audio track ourselves, and we're choosing different frequency tones for each thing we want to know about (i.e. trial onset, target onset, etc). Does knowing the frequency of the tones ahead of time make this any different/easier? And yes, we are looking to be as precise as about timing, down to 30fps or ~30ms (potentially even more precise than this in the future).

Thanks again!
posted by ancient star at 11:54 AM on November 23, 2015

What about a band pass filter then just search for the level to go over some value?
I think 1/30 sec is no problem if the audio is sampled around 44khz.
I'd try sox bandpass to make 5 different wav files for each tone and then try sox stat.
posted by bdc34 at 12:40 PM on November 23, 2015 [1 favorite]

What about a band pass filter then just search for the level to go over some value?

If there's anything other than beeps in your audio, this will give you tons of false positives. Any sound with a component that matches your filter will look like a beep. This includes all kinds of atonal sounds, like wind, or clapping, or a ton of stuff.

Whether you use an FFT or a filter or any other approach, you can't just look at the presence of your signal frequency. You have to compare the strength of your signal to the strength of rest of the waveform.
posted by aubilenon at 1:54 PM on November 23, 2015

You may want to look to DTMF detection libraries/algorithms for inspiration. They seem to typically use the Goertzel Algorithm as it is more efficient when you are attempting to detect one of a specific set of tones rather than deconstructing all of the components.

You'll probably find it easiest to modify one of the existing DTMF detection libraries to detect your single tones. False positives should be relatively easy to throw out since the real tones are only in one channel, so anything that is detected in both is guaranteed to be spurious.
posted by wierdo at 4:12 PM on November 23, 2015

I have some python code that you might be able to adapt. It listens to an audio stream for a set of tones and triggers the twitter api when it detects them.
posted by hey you over in the corner at 5:06 PM on November 23, 2015

Librosa is a set of audio processing python libraries which includes a module called onsetDetect which does exactly that. You could probably do this in about 4 lines of code using the onsetDetect function to determine the index of each onset and an fft to determine the specific frequency at each index.

Check out which has a worked example.
posted by TwoWordReview at 5:24 PM on November 23, 2015

« Older One-woman actor of a hundred-plus character...   |   Driving the Coquihalla in winter Newer »
This thread is closed to new comments.