How to display a transcript in sync with a media file?
March 19, 2009 4:54 PM   Subscribe

I have a transcript with timestamps and a media file. How do I display both of them on a webpage at the same time, synchronized? Oh, and it has to work in any browser or OS.

I have a transcript (foo.txt) composed of 1000 sentences. Each sentence is given its own line, and some sentences have accompanying comment lines as well. The sentence lines are formatted as "Speaker|Sentence sentence sentence.|timestamp", and the comments lines are formatted as "comment|Comment comment comment". An example could be:

*MOT|which way we're gonna go ?|01234_02234
%com|Mother is gesturing toward train set.
*CHI|backwards .|02234_02934

With this transcript is a media file (audio or video, the solution should be able to handle both). The timestamps correspond to points in the media file (in milliseconds).

I want to display the entire transcript alongside the media file, and when playing the media file, have the corresponding lines in the transcript be highlighted (1). The transcript should also automatically scroll so that the media is always in view. Finally, users should be able to begin playback either by going to a particular point in the transcript and hitting play (or some keyboard shortcut), or a particular point in the media and hitting play (or some keyboard shortcut).

An important caveat is that this must work in older browsers and both Windows/OS X (Linux support is unnecessary/probably follows from Windows/OS X support). All my research has come up with solutions using HTML5, which obviously won't work in older browsers. Staying away from Flash is also preferable, but not mandatory.

My guess is that there's some Ajax that can grab the time in the media and compare that to the timestamp for each line, and then highlight that particular line (either by encapsulating each line with a tag, or some fancy XML or JSON parsing). I don't know! Help me, hive!

(1) Particularly, I'm not looking for closed-captioning (a la Quicktime and SMIL). The transcript should be separate from the media file, and should itself be navigable.
posted by isnotchicago to Computers & Internet (6 answers total)
 
SMIL is good for that. I have used it for the exact situation you described, with text synchronized to a media file. You'd have to format your transcript in accordance with the standard, but that will work.
posted by adipocere at 5:17 PM on March 19, 2009


Response by poster: My understanding of SMIL is that it plays a block of text for a certain duration, and that's it. I want all the text to be visible at all times, and for the user to be able to start playback at any arbitrary point in either the media or transcript. Imagine a two-column design, with the transcript in one column and the media in the other; the text is persistent, but what section is highlighted changes in time with the media. Is this possible with SMIL? I don't have any experience with SMIL, just did some quick research.
posted by isnotchicago at 7:01 PM on March 19, 2009


i'm not a front-end web developer, and i don't know how precise javascript timers are but the following scheme may be a potential solution.

use a javascript function starts the media clip and a timer, everytime the timer fires it highlights a particluar portion of the page displaying the transcript.

the transcript would need to be broken out into container elements that contain the timestamp as the container id.

when the timer fires, it gets the appropriate transcript container and sets the background color, or some other css style to produce the desired effect.
posted by askmehow at 12:59 AM on March 20, 2009


Response by poster: That would work for the single case of playing the media from the beginning, but it wouldn't allow users to start from an arbitrary point in the transcript (e.g., line 483), and have the media play from that point (e.g., 10 minutes in). Is there a way to have Javascript look at the sentence's time stamp and then tell the media file to play from that point? Or at worst, extract what point in media file you are currently at, and throw that to a function which then highlights the appropriate sentence?
posted by isnotchicago at 8:56 AM on March 20, 2009


Ah, I understand better. I was confused by the "scroll" part — I had that mentally eclipsing the "entire transcript" part, since I thought of a window of text scrolling the transcript by. Unless this is a very short media file, the transcript window will be much larger than the media window.

SMIL can put text in one spot and video (or audio) in another spot, and keep them synchronized. You can start in the middle, go back, etc. — still synchronized. But you're right, that wouldn't necessarily make the text navigable.

So, are you thinking of this page, which shows a very large text transcript in its entirety, and the media window off to the right (for an example), slides down the page along with a changing, green-highlit sentence, with its baseline roughly colinear with the vertical middle of the media file? I would rate that as being significantly more difficult than having an immobile media window, with the text scrolling alongside and not the entire transcript viewable. You might want to have the transcript confined to a window, with scrollbars. I realize that's not exactly what you asked, but you'd have your work cut out for you otherwise.

I'm starting to like askmehow's answer better than mine. When I did text captioning of a movie, I did line repeats in parts of it.

Frame 34

Line A
Line B
Line C

Frame 35

Line B
Line C
Line D

If you're going with Javascript, you'll have to choose your format carefully to make sure that the play can support the functions you need.

It looks like you'd need an onclick or something similar for each line in the transcript. That would have to then trigger some Javascript function which takes the id of the transcript block which called it and either lookup against times, or somehow decode the id into a time (id="03020145" meaning 3 hours, 2 minutes, and 1.45 seconds in), then try to manipulate the player and send it to that time.

That part is probably fairly easy in Javascript compared to, say, grabbing the controls on the media player, fast-forwarding a bit, and then expecting that to change how the transcript is displayed. There, you would want a repeating Javascript event which would look at the player, try to extract where it is, and then synchronize the display of the transcript. Furthermore, that repeating event would have to know not to try to interfere when someone has just jumped ahead or back in the transcript, probably through setting a flag, good for a second or two.

What a sticky situation. If you do go the Javascript route, your most critical part will be making sure the player supports the functions you must have. That looks like the decision point to me.
posted by adipocere at 9:19 AM on March 20, 2009


Response by poster: When I say "the entire text should be visible", all I mean is that it is persistent and able to be seen at any time. That rules out SMIL with its durations (right?), but I really did have a basic "text in the left frame, media in the right frame" idea. So, yes, sorry for the confusion. The entire transcript would not literally be completely visible; it just wouldn't appear at some point and disappear at another like in closed captioning or slide presentations.

I had a doh! moment when I read "make sure the player supports the functions", and then went to check the functions QuickTime supports. Lo and behold, there are GetTime() and SetTime() functions. My thinking now is:

Each sentence would have a button next to it. When clicked, a function would read the start time for that sentence (pulled from the time stamp), and set SetTime() as that time. Conversely, going to a point in the media would call a function that gets the time from GetTime() and then highlights the sentence with a time stamp range that covers that time. The text could scroll using anchor tags, and Mootools or similar could be used to make that scrolling smooth.

Now my question is, "is this the best implementation?" Flash looks to have similar time functions that can be made externally available to Javascript, so that's another possible route. Maybe this is more easily done server-side somehow (PHP, or Perl, etc.)? HTML 5 is going to have new media elements, but, while cool and simple, wouldn't work for any user not running cutting-edge beta software. A W3C blog has an interesting HTML 5 post that I'll file away for 2012 when IE 16 finally supports HTML 5.

Thanks a bunch askmehow and adipocere. I'm not quite at a solution yet, but ideas are flowing.
posted by isnotchicago at 11:01 AM on March 20, 2009


« Older where in Japan is most Oaklandish?   |   HD animated maps for video? Newer »
This thread is closed to new comments.