# How do I turn my kids voices into images?

May 8, 2011 12:52 PM Subscribe

I have an audio recording of children playing. I want to take data from that recording (frequency? volume? something else?) and use those numbers to generate some sort of visual output.

As you can see it's all very vague because I have no clue of:

a)what variables are found in an audio file

b)how to obtain them

I plan on using Processing for the visual part, but I need an idea on how and what to get from my audio.

Any advice on programs that could fit my case?

or info on the science of sounds. or something.

(this is close to what I would like to achieve using the data taken from the audio files: link )

As you can see it's all very vague because I have no clue of:

a)what variables are found in an audio file

b)how to obtain them

I plan on using Processing for the visual part, but I need an idea on how and what to get from my audio.

Any advice on programs that could fit my case?

or info on the science of sounds. or something.

(this is close to what I would like to achieve using the data taken from the audio files: link )

Response by poster: gareth, thanks for that link. That helps to reduce my ignorance regarding sounds and signals.

you're a kitty: that's the sound wave as is. I want to take the numbers in the wave and put them into my graphic generating equation.

Again, I apologise because I basically don't know what I'm talking about, yet.

posted by uauage at 1:22 PM on May 8, 2011

you're a kitty: that's the sound wave as is. I want to take the numbers in the wave and put them into my graphic generating equation.

Again, I apologise because I basically don't know what I'm talking about, yet.

posted by uauage at 1:22 PM on May 8, 2011

Do you know python? Can you learn to use the Echo Nest Remix SDK?

Here is a nifty example of what you could do.

posted by hariya at 1:23 PM on May 8, 2011

Here is a nifty example of what you could do.

posted by hariya at 1:23 PM on May 8, 2011

In a bigger picture sense, a mono sound is a one dimensional variable (sound pressure over time), and every interesting property of that sound is going to be a more abstract property of that variable's change over time.

Track the absolute value of the sound pressure over time, with some sort of averaging factor (root mean squared is popular) and you get the amplitude envelope, describing how loud the sound is overall. Look at patterns in the change of amplitude over time and for musical signals (the kind that have higher level patterns) you will get the next level of abstraction: the beat. Go up one more level and you get dynamic structure (patterns of change of beat and intensity).

You can also look at the frequency behavior of the change of sound pressure over time, which can be used to get the pitches and timbres. Then you can go one step further and track the change in frequency behavior over time (melody) and then one step beyond that and get patterns in change in frequency behavior over time (tonality).

Basically you are starting with sound pressure level and getting more and more abstract as you derive higher and higher levels of musically pertinent information. Various signals respond interestingly to differing types of analysis and differing numbers of tiers of analysis. A rough rule of thumb is that the more your sound is like classical european art music, the larger the number of tiers of existing analysis techniques which will be useful.

In terms of environmental recording (like your case, sounds of kids playing) the higher level analysis designed for music probably won't get you much interesting. But if you know how to play around with numbers you should still be able to find patterns, and patterns of those patterns. You may have to invent some of the methods of finding the patterns. Don't be afraid to be playful - just because they are numbers doesn't mean you can't try something silly. Audio analysis of music is still very open with huge numbers of discoveries waiting to be made. Audio analysis of patterns of non-musical material (or musical material that is not an approximation of classical european art music) is full of discoveries waiting to be made.

posted by idiopath at 1:46 PM on May 8, 2011 [2 favorites]

Track the absolute value of the sound pressure over time, with some sort of averaging factor (root mean squared is popular) and you get the amplitude envelope, describing how loud the sound is overall. Look at patterns in the change of amplitude over time and for musical signals (the kind that have higher level patterns) you will get the next level of abstraction: the beat. Go up one more level and you get dynamic structure (patterns of change of beat and intensity).

You can also look at the frequency behavior of the change of sound pressure over time, which can be used to get the pitches and timbres. Then you can go one step further and track the change in frequency behavior over time (melody) and then one step beyond that and get patterns in change in frequency behavior over time (tonality).

Basically you are starting with sound pressure level and getting more and more abstract as you derive higher and higher levels of musically pertinent information. Various signals respond interestingly to differing types of analysis and differing numbers of tiers of analysis. A rough rule of thumb is that the more your sound is like classical european art music, the larger the number of tiers of existing analysis techniques which will be useful.

In terms of environmental recording (like your case, sounds of kids playing) the higher level analysis designed for music probably won't get you much interesting. But if you know how to play around with numbers you should still be able to find patterns, and patterns of those patterns. You may have to invent some of the methods of finding the patterns. Don't be afraid to be playful - just because they are numbers doesn't mean you can't try something silly. Audio analysis of music is still very open with huge numbers of discoveries waiting to be made. Audio analysis of patterns of non-musical material (or musical material that is not an approximation of classical european art music) is full of discoveries waiting to be made.

posted by idiopath at 1:46 PM on May 8, 2011 [2 favorites]

This is easy.

posted by Civil_Disobedient at 2:17 PM on May 8, 2011

- Download Geiss
- Run Geiss from the standalone, or as a WinAmp plugin
- Done.

posted by Civil_Disobedient at 2:17 PM on May 8, 2011

Supercollider is made for exactly this sort of thing.

posted by elektrotechnicus at 4:19 PM on May 8, 2011

posted by elektrotechnicus at 4:19 PM on May 8, 2011

I guess my other answer gave the theory side of the audio analysis part, but for the pragmatic side of things, I think for your particular task processing has the best balance of ease of use, attractiveness of results, and control over the process (as compared to doing it in C or C++, puredata, supercollider, csound, blender, and a bunch of other environments that can do some kind of rudimentary audio/video I have tried that were either not memorable enough or not useful enough to mention here).

posted by idiopath at 4:44 PM on May 8, 2011

posted by idiopath at 4:44 PM on May 8, 2011

Best answer: The sound wave IS a one dimensional function -- BUT every sound wave is also a sum of an infinite number of sine and cosine waves, each representing a distinct frequency. You can get at those by running a Fast Fourier Transform (FFT) on the sound.

This is how all those fancy screensavers and visualizations like this one work. When you see a graphic equalizer bouncing around, that's also the result of a FFT.

If you think of a soundwave as a graph with an X and Y axis, where X is time and Y is volume, then a FFT produces a new graph for each point in time where X is frequency and Y is amplitude. You can use that data to produce a dynamic visualization.

posted by empath at 4:47 PM on May 8, 2011

This is how all those fancy screensavers and visualizations like this one work. When you see a graphic equalizer bouncing around, that's also the result of a FFT.

If you think of a soundwave as a graph with an X and Y axis, where X is time and Y is volume, then a FFT produces a new graph for each point in time where X is frequency and Y is amplitude. You can use that data to produce a dynamic visualization.

posted by empath at 4:47 PM on May 8, 2011

Best answer: Yes, frequency is one property of a sound wave over time.

Pedantically it is more correct to say that every sound

But they are also representable by a finite number of wavelets of arbitrary shape. The sine function is not the "real structure" hiding under the sound, but a mathematically useful decomposition of the sound, equally valid beside other equally valid and incompatible decompositions of the sound.

An FFT is a somewhat arbitrary windowing, and the FFT used by any mainstream software is actually quite lossy, for efficiency reasons. You can get more accurate frequency results with a larger window, more accurate timing results with a smaller window. A different FFT window size will give you different frequencies. Really. And they are all equally valid. A common mistake (especially for electronic musicians and composers) is to think you can get "the frequencies" in a sound based on an FFT. Actual pitch tracking and analysis is a much more difficult problem, any FFT will give its results in terms of a small number of frequencies over a fixed and fairly arbitrary interval (ie. one analysis will give you an amplitude for 12 hz, another for 24 hz, 36, 48, 60, 72 etc. - if you feed it a sine tone at 65 hz the output will show both 60 and 72 hz, and depending on the phase of the signal it will also contain other frequencies that are nowhere near those). FFT does not give you "the frequencies", it gives you a set of frequencies and amplitudes and phases that when combined are guaranteed to reproduce your sound. Among other equally valid sets of frequencies. FFT is a great start for analysis, but it is really only scratching the surface for feature extraction purposes.

So, as I said, the one dimensional function that is a sound wave has a number of more abstract properties that are analyzed.

posted by idiopath at 5:56 PM on May 8, 2011

Pedantically it is more correct to say that every sound

*can be broken down*into some number of sine functions. Yes in an analog sound it is infinite in count in the worst case, but all digital signals of finite length are fully representable by a finite set of sine functions.But they are also representable by a finite number of wavelets of arbitrary shape. The sine function is not the "real structure" hiding under the sound, but a mathematically useful decomposition of the sound, equally valid beside other equally valid and incompatible decompositions of the sound.

An FFT is a somewhat arbitrary windowing, and the FFT used by any mainstream software is actually quite lossy, for efficiency reasons. You can get more accurate frequency results with a larger window, more accurate timing results with a smaller window. A different FFT window size will give you different frequencies. Really. And they are all equally valid. A common mistake (especially for electronic musicians and composers) is to think you can get "the frequencies" in a sound based on an FFT. Actual pitch tracking and analysis is a much more difficult problem, any FFT will give its results in terms of a small number of frequencies over a fixed and fairly arbitrary interval (ie. one analysis will give you an amplitude for 12 hz, another for 24 hz, 36, 48, 60, 72 etc. - if you feed it a sine tone at 65 hz the output will show both 60 and 72 hz, and depending on the phase of the signal it will also contain other frequencies that are nowhere near those). FFT does not give you "the frequencies", it gives you a set of frequencies and amplitudes and phases that when combined are guaranteed to reproduce your sound. Among other equally valid sets of frequencies. FFT is a great start for analysis, but it is really only scratching the surface for feature extraction purposes.

So, as I said, the one dimensional function that is a sound wave has a number of more abstract properties that are analyzed.

posted by idiopath at 5:56 PM on May 8, 2011

Response by poster: empath, idiopath: thanks. FFT. That was the answer to my question. It works beautifully.

I looked for FFT libraries on the Processing website, studied their examples, and ta-daaaaa!

Thanks everyone!!!!

posted by uauage at 2:09 PM on May 17, 2011

I looked for FFT libraries on the Processing website, studied their examples, and ta-daaaaa!

Thanks everyone!!!!

posted by uauage at 2:09 PM on May 17, 2011

Response by poster: ps. In case anyone is wondering what came out of all this: have a look.

posted by uauage at 4:16 PM on May 17, 2011 [1 favorite]

posted by uauage at 4:16 PM on May 17, 2011 [1 favorite]

This thread is closed to new comments.

http://en.wikipedia.org/wiki/Spectrogram

Sound files are chock full of data, a spectrogram is a good place to start as it's similar to how our brains process sounds; responding to patterns of frequencies over time. I've played with sound files quite a bit, mostly using matlab and writing my own code. I'm not sure if that'd be right for what you're trying to do, there are probably simpler ways...

hope that helps!

posted by garethspor at 12:59 PM on May 8, 2011