YouTube Captions Have Overlapping Start-Times
March 25, 2020 2:40 PM   Subscribe

When watching a video on YouTube with the auto-created captions, they usually "scroll" up, line-by-line, with two lines appearing on screen at any one time. However, when downloading those captions as a text file, the start and stop times for successive lines overlap. How to fix this?

Part of a downloaded text file may look like the following:

0:00:10.370,0:00:16.289
He clasps the crag with crooked hands

0:00:13.349,0:00:18.869
Close to the sun in lonely lands

0:00:16.289,0:00:21.240
Ringed with the azure world he stands

The first line appears at 10.370, but at 13.349, the second line appears and the first line scrolls up (still visible until 16.289). Fine. But if I alter this downloaded file to fix something and then upload it, YouTube ceases scrolling and just puts the first line there and then piles the second line on top of it.

Thoughts:
  • Is there a way to force "scrolling" in a caption file?
  • I know I could change all of those start/end times, but that sounds awful.
Do you know how to fix this problem?
posted by klausman to Computers & Internet (4 answers total) 1 user marked this as a favorite
 
Best answer: You can rewrite all the end-times from the following line's start-time with a regular expression:

perl -e 'local $/; $_=<>; s/,([0-9:.]+)(\n.*?\n\n)([0-9:.]+)(?=,)/,\3\2\3/mg; print' < SUBTITLES_FILENAME

posted by Phssthpok at 4:27 PM on March 25, 2020 [1 favorite]


Response by poster: I thought about using a regular expression, but I’m not super proficient yet. I will try out what you gave me and report back. Thanks!
posted by klausman at 4:28 PM on March 25, 2020


This is one of things that most free subtitling software (e.g. Subtitle Workshop) tends to auto-do fairly well - checking and adjusting for minimum display times, combining short lines into one and displaying for the combined time, etc.
posted by Pinback at 6:00 PM on March 25, 2020


Response by poster: I used regex and it worked, though I did have to adjust what you gave me, Phssthpok. [I'm doing my typing in Atom, and I know almost nothing about regex, except that it is awesome]. Here's the ugly thing I used for find-and-replace:

Find in buffer:
([0-9]:[0-9]{2,}:[0-9]{2,}.[0-9]{3,},)([0-9]:[0-9]{2,}:[0-9]{2,}.[0-9]{3,})(\n.*)(\n\n)([0-9]:[0-9]{2,}:[0-9]{2,}.[0-9]{3,},)([0-9]:[0-9]{2,}:[0-9]{2,}.[0-9]{3,})(\n.*)(\n)(\n)

Replace:
$1$6$3$7$9$9

It turned things like this:

0:00:00.000,0:00:03.510
Whose woods these are I think I know.

0:00:03.510,0:00:05.879
His house is in the village though;

0:00:05.879,0:00:07.980
He will not see me stopping here

0:00:07.980,0:00:09.780
To watch his woods fill up with snow.

Into this:

0:00:00.000,0:00:05.879
Whose woods these are I think I know.
His house is in the village though;

0:00:05.879,0:00:09.780
He will not see me stopping here
To watch his woods fill up with snow.


It's hackish and ugly, but it got it done. I decided that instead of ending a line where the next one began, that I would display two lines at once, ending them when the 2nd line would have ended.

As for free subtitling software, I would have thought YouTube itself would make this easier. Maybe it's a bug, but having one functionality (scrolling) misbehave after editing captions is pretty bad. But, problem solved.
[I would gladly accept a simplification of my silly-looking regex, if folks are into that.]
posted by klausman at 6:34 PM on March 25, 2020 [1 favorite]


« Older Do some friendships not "translate" well to text?   |   How not to care about opinions of others - with a... Newer »
This thread is closed to new comments.