Spectral Similarity: I've heard that one before!
July 16, 2007 6:31 AM   Subscribe

Best mechanism for determining spectral similarity of FFTs?

I'm looking at FFTs of audio, as for example might be emitted by my present favorite tool for this, ARSE. I'd like algorithms (or sample code) that are able to look at small chunks of spectra and determine that the patterns extant in one exist in another. The metric needs to be somewhat scalable, as in I need to be able to determine the degree to which one block of 0.5 seconds of audio has the same spectral structure as another block of 0.5 seconds of audio -- 0%, 50%, etc.

I did attempt a fully image-domain comparison using ImgSeek; the results were OK but not wildly useful.

Any thoughts? I'm working on an audio dotplot engine, and while I'm getting decent results with a ludicrously simple similarity metric, I'd like something more rigorous, and less noisy.

Note, I'm looking for something that can match on speech similarity, but can also match on a violin note. I'm also very specifically looking for mechanisms that work on chunks of audio approximately 100ms to 500ms long. This seems to disqualify the algorithms in MusicMiner.

Hive mind, help me hack :)

(This is for this project, which renders rather interesting dotplots out of WinAMP spectra.)
posted by effugas to Computers & Internet (4 answers total) 1 user marked this as a favorite
 
Have you ever read anything about Empirical Mode Decomposition (EMD) and the Hilbert-Huang Transform (HHT)? (For example) I'm not sure if you're looking for an out of the box solution for this (and how much effort you'd want to put in), but it might be worth looking at.
posted by Comrade_robot at 7:26 AM on July 16, 2007


I've never done this, but I've done a lot off FFT crap.

To state the obvious, equivalence is a lot easier to determine than similarity, especially with integer values. With complex values (in which I include those with 0i imaginary parts), weighting of some sort will probably have to play a part.

Obviously, to make your algorithm work as well as possible, the inputs will have to be normalized, something done by default in most FFTs, but not all. A quiet and loud tone have the same spectral elements, just different magnitudes. Do you consider a quiet violin note the same as a loud one for your purposes, or do you just want to tell the difference between a 1KHz whistle and a 1KHz violin note? (How about with/without vibrato? The AM of vibrato would tell me that I was hearing one note, amplitude varying, but the signature in an FFT would show some low frequency components that CLEARLY made it different than a non-vibrato note. Interesting problem.)

Have you spent any time with reference inputs to determine if YOU can see similarities in the spectral contents? You are obviously getting some coefficients from some routine or another, which I presume you did not write from scratch?

You might encounter problems depending on where in the FFT window your unknown sample occurs. Are you using a windowing function on your raw input data? These two are related, as they will impact magnitudes.

Direct comparisons on the entire FFT output would seem to be useless because the phase info for a zero amplitude is meaningless, but can be a large value.

You may just want to do a bin-by-bin weighted comparison on the normalized power spectra, weighing higher amplitudes progressively more heavily.

I'm curious to see what else is recommended. Most of the time, I am only interested in a few spectral elements, harmonics, relative magnitudes, etc. There are some first class techno weenies here that will undoubtedly have not only good recommendations for you, but will point me to look at things that I should be looking at.

Good luck.
posted by FauxScot at 9:35 AM on July 16, 2007


I've sort of had some success extracting some relevant features (freq, amp of the top 15 peaks, for instance) and plugging them into a kohonen self-organizing feature-map and let it learn how to reduce sounds from a 30 dimensional vector into a 2-d one, simple pythagorean distance gives a pretty good similarity metric between two sounds.

Depending on the sort of sounds you're trying to classify, that initial feature extraction may be more difficult or complicated. You're deciding what features are relevant, and then the SOFM decides what ways they can interact that's relevant.

I used this approach to try and distinguish different vowel sounds (with so-so success), and I've seen it used to tell a piano from a violin pretty effectively.
posted by aubilenon at 12:13 PM on July 16, 2007


I would check out this paper on using sound similarity to detect the rhythm in a piece of music. Not exactly what you're trying to do, but the underlying algorithm basically slices up the audio data, does a fft on each slice, and then the 'similarity' between slices is calculated.
posted by christy at 5:52 PM on July 16, 2007


« Older Used Musical Instruments in Vancouver BC   |   My great train ride across Canada. Newer »
This thread is closed to new comments.