mp3 - WAV conversion: how does this work?
September 24, 2011 1:01 PM   Subscribe

mp3 is deemed by some audiophiles to be a poorer quality format than something like PCM WAV. Yet mp3 files can easily be converted to WAV. So the data that apparently is missing from mp3 files (and ergo detracting from the overall audio quality) must be there somewhere in those files, just hidden or compressed? Is that correct?? What exactly does an mp3 file leave out when it compresses what, presumably, is the original PCM WAV file of a track?? And if using an A/D converter of some description you convert something from vinyl direct to mp3 you presumably could then convert that track to WAV - so where would all that additional data comprising the uncompressed WAV file (that may be ten times larger than the mp3) come from? I've absolutely no IT knowledge at all - so forgive me if these are embarrassingly dumb/elementary questions. I'm just intrigued.....
posted by MajorDundee to Technology (16 answers total) 4 users marked this as a favorite
 
If you started with a WAV and converted it to MP3 and then back to WAV, you'd get a file the same size as you started, but with lower quality.

Think of this as a simple analogy - you start with a rational number, like 1.864 and converting to to mp3 makes it an integer, like 2 and then converting it back to wav makes it a rational number again but now it's 2.000. This is a rather blunt analogy but it's the right basic idea. Converting to wav doesn't add data back.
posted by RustyBrooks at 1:05 PM on September 24, 2011 [18 favorites]


Mp3 is compressed and lossy, meaning data is lost when you convert. The lost data is sort of "figured out" by the machines that play it back, but its not perfect. But the tradeoff is made ideally so that its imperceptible (or at least acceptable) to the human ear. You can convert back to wav, which is uncomporessed and lossless, but you're just guessing to get that lost data back. The data is gone.
posted by wooh at 1:05 PM on September 24, 2011 [1 favorite]


Yet mp3 files can easily be converted to WAV. So the data that apparently is missing from mp3 files (and ergo detracting from the overall audio quality) must be there somewhere in those files, just hidden or compressed?
No: file formats don't have inherent quality. For instance, if I made a really cruddy, pixellated GIF, I could convert it into a TIFF easily, but it'll still be low-quality, with a limited color palette with artifacts and whatever - just as a TIFF.

MP3 is a lossy format, which means that it 'throws away' some data, the data that it's likely most people don't notice. When that data is thrown away, there's no way to get it back - you can interpolate more data, but that's a process of educated guesses, not actually creating or recovering the thrown away data.

To illustrate lossiness: if you take an MP3 file and reencode it 10 times, it'll sound like crud. If you take a PCM WAV (or another lossless format) and resave it, it'll be precisely the same.
posted by tmcw at 1:07 PM on September 24, 2011 [2 favorites]


No, it is not correct. Compressed MP3 = the data is thrown away. Re-converting an MP3 to WAV only uses the compressed data, there is no "additional data." You then have a WAV file with as much sonic information as the MP3, not of the original WAV.

Lingering Over Lingonberries -> LOL -> LOL

Uncompressed -> compressed -> re-converted

-not-

Lingering Over Lingonberries -> LOL -> Lingering Over Lingonberries

-nor-

Lingering Over Lingonberries -> LOL -> Laughing Out Loud
posted by rhizome at 1:08 PM on September 24, 2011 [1 favorite]


Response by poster: Many thanks all. I stand converted - i.e. corrected but with just a bit of loss.
posted by MajorDundee at 1:20 PM on September 24, 2011 [7 favorites]


It's an imperfect analogy, but there's something similar going on with image-file formats. JPEGs, for example, are lossy. They use some clever techniques that allow them to throw away information in the hopes that we won't notice the difference, and the user can decide what kind of a tradeoff to make between quality and image size (as is true with MP3s).

I've uploaded some sample images for you to look at.

The first was shot with my iPhone, meaning it was saved as a JPEG. So this isn't a perfect test case, but for our purposes, it's good enough. I've reduced it in size and saved it at maximum quality, so what you're seeing is probably as good as if the original image capture had been lossless.

For the second, I cranked the compression way, way up and saved it as another JPEG. The difference should be obvious.

For the third, I resaved the second JPEG as a PNG. Now, PNG is a lossless format, so what you're seeing is that shitty JPEG reproduced with pixel-perfect fidelity.

If you converted an MP3 to a WAV, you'd get the same effect.
posted by adamrice at 1:25 PM on September 24, 2011 [4 favorites]


I've experimented a LOT with these file formats.
In my opinion it is a question of audible dynamics.

Lets say you have a musical track in which the melody has
lots of delay and panning effects applied to it, or a glitchy
electronic track with crunchy sounds and deep bass (I have
several favorite tracks like this that I use as evaluators for
audio systems, applications and file formats).
Original .wav or CD tracks will remain faithful to the nuances
and details, but as you descend the scale of compression
options within .mp3 files, these details either disappear
permanently or muddle together. They certainly don't come
back when you upconvert back to a .wav file. adamrice's imperfect
analogy is actually excellent in my opinion.
posted by No Shmoobles at 1:39 PM on September 24, 2011


All of you may be interested in Jonathan Sterne's superb essay "The MP3 as Cultural Artifact," published in New Media & Society, 2006 Vol 8(5): 825–842

and because Prof Sterne is cool like this, available for free as a pdf on his website at:



He also did extensive field research on the process of designing and testing compression algorithms. Fascinating stuff.

Here's the abstract:
"The mp3 lies at the center of important debates around intellectual property and file-sharing, but it is also a cultural artifact in its own right. This article examines the design of the mp3 from both industrial and psychoacoustic perspectives to explain better why mp3s are so easy to exchange and the auditory dimensions of that process of exchange. As a container technology for recorded sound, the mp3 shows that the quality of ‘portability’ is central to the history of auditory representation. As a psychoacoustic technology that literally plays its listeners, the mp3 shows that digital audio culture works according to logics somewhat distinct from digital visual culture."
posted by spitbull at 2:02 PM on September 24, 2011


Oh that's weird: the link again is

http://sterneworks.org/mp3.pdf
posted by spitbull at 2:03 PM on September 24, 2011


WAV is a digital form of an oscilloscope trace. That's the way to think of it. Each successive 16-bit value is the wave-form height one time increment later.

MP3 is a lot more complicated. To make an MP3, you run the WAV-equivalent data through a Fourier transform, and for each somewhat-larger increment of time, you store the coefficients that permit reconstruction of the wave.

Only you don't store all the coefficients. You store all the amplitude coefficients, but you discard all the phase relationships. That's because our ears can't hear that. (For example, a rising sawtooth and a falling sawtooth sound exactly the same to us. They have exactly the same overtones, but the phase relationships are different.)

So when you reconstruct the waveform, it sounds the same -- to us -- but would look entirely different on an oscilloscope.

That's if the quality setting on the MP3 encode is cranked up to the ceiling. If you want to make the MP3 smaller, then during the encode it not only tosses all the phase relationships, it also tosses every amplitude which is below a threshold which is a function of the quality you chose.

When people are too aggressive about making their MP3 files small, the threshold is quite high, and the quality degradation is noticeable.
posted by Chocolate Pickle at 2:16 PM on September 24, 2011 [2 favorites]


Kind of previously.
posted by OmieWise at 2:29 PM on September 24, 2011


Think of the original .wav as having the full alphabet, and when it's converted to .mp3, some of the letters are lost. You can still write the original book, but some of the letters and words will be missing.
posted by Solomon at 2:33 PM on September 24, 2011


Every time you listen to an mp3 file, it is in a sense converted back to an uncompressed .wav file -- a temporary .wav file that only exists for a brief moment in the temporary storage of the player software/hardware -- because uncompressed raw audio samples are the only thing that any sound device can play back. So if what you posit were correct then nobody listening to mp3 files would ever hear the loss. Or equivalently, converting a mp3 to a .wav file is just like playing it except the output goes to a file and is kept instead of going to a speaker and then being discarded, so that file should have the same degeneration/loss that you can hear in the speaker.
posted by Rhomboid at 4:04 PM on September 24, 2011


So the data that apparently is missing from mp3 files (and ergo detracting from the overall audio quality) must be there somewhere in those files, just hidden or compressed? Is that correct??

That's a good question, and one that other people have gone to various degrees in answering specifically for this case earlier in the thread. Here is the way to think about this that proves that this not a bunch of audiophile hokum:

Imagine you have some audio data in a WAV file. 1, 0, 0, 1 and so on. You come up with a crazy file format that compresses it to half the size you started with. So look at the whole range of the possible audio files you can encode before applying your new magical compression system. Let's just say for sake of argument that you are only dealing with files that are at most one byte long. you got 00000000, 00000001, 00000010 and so on (If you are not familiar with the method of writing numbers as only zeros and ones which nerds call "binary" this is just a way of saying 0, 1, 2, etc...).

Now if you are saying you can cut that down to half the size, you only have four bits. That means every string of eight bits is going to four. So for every unique 01010011 you can say, "ok, it's compressed to 1100" or whatever.

The problem is, there isn't enough room. There isn't enough space in four bits to represent all the possibilities in eight bits. This means that you are going to have to decide that sometimes more than one long string of bits compresses to the same short string of bits. This is called the Pigeon hole principle. If you have 10 pigeons and 9 holes, there's going to be at least one hole with two pigeons in it.

Which means, there are going to be at least two different audio tracks with the same compressed representation. Which is another way of saying "information is lost". This what lossy compression is. This is what people mean when they say data is thrown away. Whatever is thrown away is whatever let you tell the difference between those tracks in the first place.

I realize I said, "pretend you are only working with eight bit-long files" but I'm sure you can see this holds no matter how long they are. If you have million bit-long files, and you only compress them by ONE bit, you are still going to be a situation where you have too many pigeons, not enough holes.

All the work Fraunhofer (the MP3 folks) and Ogg Vorbis and those types do is on figuring out the best way to decide which information to throw away. What is the least important detail to retain? There are a number of approaches: "What do people not hear?", "What is the detail that would be easiest for the software that reassembles this to guess?", etc. But it's really gone.

Now, at this point, you might be thinking, "But WAIT! I zip files all the time! They are recovered bit-for-bit perfect, I check that, always! YOU ARE FULL OF CRAP JEB." Wow, that's a good point, and I'm glad you didn't let me get away with that. The difference in that case is that a program like zip or rar or what-have-you makes no guarantees that it will definitely _shrink_ the file. It guarantees that if you zip a file and unzip that file, you will get the original file back, but it doesn't guarantee that the zipped file will be smaller. The pigeonhole principle holds when you are putting 10 pigeons in 9 holes, or, in more computer terms, if you have a sequence of three bits, you have eight total possible arrangements of the bits (000, 001, 010, 011, 100, 101, 110, 111, which we call "one", "two", "three", and so on). If you cut that down to two bits, you have only four (00, 01, 10, 11). So something has to go. Zip and its ilk do not do that. They say, "Hey, we will try to cut it down, but we MIGHT end up making it longer." There exist some input files for which zip or rar or its friends will produce enlarged output. At this point, you might be able to see that this is mathematically unavoidable, BECAUSE OF THE GODDAMN PIGEONS.

This is what is called lossless compression. PNG image files use this, FLAC audio files use this. In this world, the game is "How do we choose rules of compression that make the files a lot smaller, and are very unlikely to make it bigger, for the files we will see in the real world?" Even though an image file COULD represent any screenful of pixels, very rarely do people photograph say, flat walls that alternate between black and white at exactly every pixel. For audio, the case is the same: a pure burst of static can be represented in the file, but its a lot more likely that music is in the file, so the compression logic is optimized to deal with those cases.

To see an example of this, try to zip a file that's already been zipped. It will get bigger. Zip, PNG, FLAC, and their buddies work much differently from JPEG, MP3, and so on. They basically try to find patterns and say, "first we had this pattern, then we had this pattern, then we had the first pattern, then we had the second pattern." It works great when there's a lot of redundant data in the file, but if there isn't, all that chatter about what pattern is going to be called what makes the file bigger.

And if using an A/D converter of some description you convert something from vinyl direct to mp3 you presumably could then convert that track to WAV - so where would all that additional data comprising the uncompressed WAV file

Frankly, this never happens. The A/D converter outputs a number of pigeons which are put into a large number of holes. This is a WAV file. These are then put into a smaller number of holes, some pigeons are forced to double up. Now you have an MP3 file. Then when you convert that back to an MP3 file. The most important pigeons are restored to their original holes, and the computer makes some guesses about how to distribute the rest. These guessed holes comprise the additional data you are looking for.
posted by jeb at 4:08 PM on September 24, 2011 [4 favorites]


To make yet another analogy:
You might say that a photograph is a more accurate reproduction of a person's face than a sketch.

Simply because you can easily convert the sketch to a photograph (by taking a picture of it) does not mean the resulting photograph will have the same quality as a photograph of the actual person.

Yes, you end up with two items in the same format when you're done (both photographs), but this does not make them identical. Some information was lost in the conversion to a sketch. You don't get it back by photographing the sketch.

Similarly, making a wav file from original high quality audio recordings results in better sound quality than making a wav file from an MP3 that has already lost some information.
posted by tylerkaraszewski at 6:49 PM on September 24, 2011


It's like the game un-PCly known as Chinese Whispers (I think it's called Telephone elsewhere). I have something to say. I could say it out loud, nice and clear, just like it is in my head - a .wav. Alternatively, I could convert it to MP3 by whispering it to the person next to me, and asking them to do the same, for, say, 10 people, losing information along the way. Asking the tenth person to stand up and say it out loud, nice and clear - converting it back to .wav - doesn't restore the lost information.
posted by obiwanwasabi at 1:03 AM on September 25, 2011


« Older I just bought a dutch oven. What should I make in...   |   Is this a real Google thing? Newer »
This thread is closed to new comments.