I'm looking at FFTs of audio, as for example might be emitted by my present favorite tool for this,
ARSE. I'd like algorithms (or sample code) that are able to look at small chunks of spectra and determine that the patterns extant in one exist in another. The metric needs to be somewhat scalable, as in I need to be able to determine the degree to which one block of 0.5 seconds of audio has the same spectral structure as another block of 0.5 seconds of audio -- 0%, 50%, etc.
I did attempt a fully image-domain comparison using
ImgSeek; the results were OK but not wildly useful.
Any thoughts? I'm working on an audio dotplot engine, and while I'm getting decent results with a ludicrously simple similarity metric, I'd like something more rigorous, and less noisy.
Note, I'm looking for something that can match on speech similarity, but can also match on a violin note. I'm also very specifically looking for mechanisms that work on chunks of audio approximately 100ms to 500ms long. This seems to disqualify the algorithms in
MusicMiner.
Hive mind, help me hack :)
(This is for
this project, which renders rather interesting dotplots out of WinAMP spectra.)
posted by Comrade_robot at 7:26 AM on July 16, 2007