Why is the shark speaking with an Aussie accent?
March 17, 2010 12:35 AM   Subscribe

Video cognascenti: Why is it that video and audio are so easily desynchronized?

I downloaded the video of the Great White Shark Rider(tm) today, and watched it offline. The narrator finished her piece minutes ahead of the video itself. (This one... turns out the fault is in the web source, and not my downloader nor VLC player.)

Second example: I tried to isolate a minute or so of another video in Avidemux. The result had something resembling a cat being tortured for the audio portion.

What I guess I'm really asking is, why don't video protocols include timing flags to keep audio relatively in check? Or maybe I'm not asking that... To be honest, video "containers" confuse me.
posted by IAmBroom to Computers & Internet (12 answers total) 7 users marked this as a favorite
There are two reasons. First, sometimes the video is 24 fps and sometimes it's 30. If you (and your player) think it's one and it's really the other, then the sound will desynchronize from the video rapidly, and a long ways.

Second, it isn't really "24" and "30" exactly. The real frame rates are 1000/1001 times those two values. The reason for that is historical and too strange to explain here. (It has to do with the introduction of color TV back in the 1960's.)

So "30 fps" is really 29.97 fps. Over the course of one hour, that's a difference of about 3.6 seconds, so if your player is assuming the video is 30 fps instead of 29.97 that's how far the sound will drift after an hour.
posted by Chocolate Pickle at 1:00 AM on March 17, 2010

They're not easily desynchronised in decent container formats. Decent container formats do include something like timing / synchronisation flags to keep audio & video in sync. AVI is not a decent container format…

Specifically, AVI doesn't have anything like timecode stamp that says "OK, video time is this; play the chunk of audio tagged with the same time". It assumes that the video and audio are in sync to start with, that each video chunk is a certain time long (depends on various factors), and that the audio is divided into chunks that play for the same length of time.

Stuff with that - for example, start with them out of sync, change the video framerate in the header so 24fps video plays back at 25 or 30 fps (without repacking), try to put VBR MP3 audio in an .AVI container (it can be hacked, but it's just that - a hack), or corrupt the stream slightly - and you've just blown .AVI's assumptions on timing.

FWIW: somehow, people manage to totally cock up the timing in containers that do include frame-by-frame synchronisation, like MPEG transport and program streams. I've never really understood how - I mean, you can demux them into totally separate files, play with them separately, re-encode them, put them back together, and unless you've done something silly like cut different chunks out of each or changed the framerate or pitch by simply speeding it up or slowing it down, they'll almost always stay in sync. It's almost like that famous Charles Babbage quote
posted by Pinback at 1:02 AM on March 17, 2010 [1 favorite]

I've favourited this question too, because I'm looking for a more complete answer and solution, but here's what I know:

1) Frame rate can be an issue - common formats use 23.976 fps, 23.98 fps, 24fps, 25fps, 29.97 fps and 30fps - things that have been converted between these formats can lose audio sync if the audio isn't handled correctly. The container file may have an incorrectly specified frame rate. Those weird fractional framerates have something to do with making colour NTSC back-compatible with black and white NTSC.
2) Missing frames - an error in the encoding may result in a missing video frame, which isn't compensated for in the audio.
3) Variable bitrate audio can cause problems, apparently.
4) A lot of video container formats suck and, as you've indicated, they often don't have any kind of frame-by-frame timing flags to keep things in sync.

I don't know why, by now, things haven't just been made to work.
posted by Jimbob at 1:03 AM on March 17, 2010

There are some formats where it's done - it's often improper encoding that causes the problem...or a drift.

Different formats don't need it; it's predominantly the web that has been the issue. Footage on tape? Physically the track for audio is right next to the track for video on a video tape. it's a non issue. When I've encountered a capture problem- it's always about latency; the wiring of the video to the capture mechanism is taking a different path than the audio.

When it drifts further (upon capture?)- usually that's a problem with the software manufacturer.

DV can have locked audio, where the audio has a reference or unlocked where the stream is separate - and the early editorial systems would struggle; Some of the canon cameras used to be a major pain.

When you author a DVD, the audio and video are broken up into the elementary streams and only put back together (muxing) during the authoring process. Why? Because you might have different language audio/director commentaries. During the Multiplexing process/muxing the Video and each audio part are written close together - so when you access some of the video, the audio for it is right there; but, we're assuming nobody made a mistake (and gave you audio that is 1 second short, etc.)

Each format has some level of frailty, and we're often trying to compress the crap out of our video - less and less data = more fragility. Oh yeah, we don't want to babysit the encoding, so it goes unmonitored and if 'something goes wrong' (a latency in part of the encoder) we have a problem.

It'd be really nice if we could wipe out all the video formats and just do it over; we tried that with HD; meanwhile we have more formats (both proprietary and standards) than ever. It's a mess.
posted by filmgeek at 3:20 AM on March 17, 2010

I've used an older, but at one time expensive, plasma TV display. Apparently it had to do some processing to format the video for the screen, and this would cause it to lag behind the sound. There was a menu item that allowed one to set a delay in the audio to match whatever delay the video was currently experiencing, which could be dependent on the format or content. It sucked, because you could only add extra speakers by going through the TV, otherwise there would be phasing and echo effects.
posted by StickyCarpet at 3:54 AM on March 17, 2010

This isn't a direct answer to your question, but a sort of fix for this situation, in case you're looking for one: since you're using VLC player (which I prefer myself as well) you can adjust the syncing of sound and video by pressing j and k while the video is running. That way, you can put just enough delay on it for the two to sync up; this often won't work perfectly for longer vids, but it helps for me, especially on videos that are really bad.
posted by koeselitz at 4:07 AM on March 17, 2010 [2 favorites]

AVI timing and audio sync, from the guy who wrote VirtualDub. Which is interesting by itself, but I am also linking just to show this stuff is hard. The average DVD to AVI file you run into is probably using non-standard VBR MP3 audio, was encoded by a random and possibly hacked version of XVid, and was run through a chain of a dozen tools. Players are expected to deal with anything, and they do a pretty good job, but AV sync is usually the first thing to go.
posted by smackfu at 5:48 AM on March 17, 2010 [1 favorite]

Pinback, thanks for that awesome Charles Babbage quote.
posted by IAmBroom at 9:08 AM on March 17, 2010

And thanks to all for coherent, meaningful, speedy replies.
posted by IAmBroom at 9:09 AM on March 17, 2010

Thanks for the VLC tip, koeselitz. I was watching something just last night with out of sync audio, and that made it work well enough for my purposes.
posted by Jawn at 2:46 PM on March 17, 2010

Wow, so much misinformation here. First of all, VBR mp3 in .avi files is not a hack, it is completely specified. Secondly, the notion of a "hacked XviD" is ludicrous. XviD is an open source encoder that was reimplemented from scratch. There would be no reason it have to 'hack it up' for any reason. You're thinking of the old, ancient DivX version 3 which was in fact a hacked up version of a Microsoft MPEG4 encoder. The two have nothing in common.
posted by Rhomboid at 3:09 PM on March 17, 2010

VBR audio may be accounted for in the AVI spec, but to quote from both Rhomboid and smackfu's links…

"There is a value in the stream headers, called dwSampleSize, which is 0 in order to trigger VBR stream seeking."

"You might think that setting dwSampleSize=0 for an audio stream would allow it to be encoded as variable bitrate (VBR) like a video stream, where each sample has a different size. Unfortunately, this is not the case -- Microsoft AVI parsers simply ignore dwSampleSize for audio streams and use nBlockAlign from the WAVEFORMATEX audio structure instead, which cannot be zero."

That is, Microsoft's own AVI parsers (as least as of 2004 and DirectShow 7) couldn't handle an allegedly spec-compliant AVI with VBR.

So how does almost everyone do it? Well, read the rest of smackfu's link for the hack (which I nearly linked myself in my original post, but I decided to keep the tech content to a minimum appropriate to IAmBroom's understanding).
posted by Pinback at 3:44 PM on March 17, 2010

« Older How do I explain programming to a 12-year-old?   |   How can my dog and my brother and his kids get... Newer »
This thread is closed to new comments.