Audio transcription/word processor help for video captions
October 21, 2022 9:59 AM   Subscribe

I’m part of a Ukraine-related volunteer project that involves translating or fixing/editing auto-generated and auto-translated captions (usually 70% accurate, 30% gibberish) from Youtube videos that we did not create ourselves. We are starting from the transcript that's available to normal youtube viewers. No access to the creator's original files. We do different things with the result depending on whether we know the creator or not and that affects the format we need the end result to be in. I need advice. Questions about word processor stuff and best format practices below the fold.

Anonymous because Ukraine war related and I don’t want to potentially attach my goofy metafilter profile to any high profile youtubers' work in Ukraine.

I’m part of a volunteer translation project helping make English subtitles for longform videos from Ukraine, which already have existing youtube auto-generated captions enabled. They’re auto-generated (ie 70% accurate, 30% unreadable gibberish) in either the original languages or auto-generated to English. We have the translation part handled but I want to automate some of the text formatting and need advice on final format as well.

The original creators don’t always have time for us American volunteers so we can’t just ask them for their files up front. We only have access to whatever a regular viewer has access to.

In some cases we might send the corrections back to them for their own use once we’ve finished the work and have proven that we aren’t wasting their time, and in other cases, we just re-upload the videos with our own captions either on our own youtube channels or elsewhere on social media.

At other times we want to turn a raw caption transcript into more human-readable paragraphs of text for journalists/bloggers to work with or to post on social media ourselves. (I don’t want advice on legality/intellectual property stuff btw)

This Ask is mostly a Word/Google Docs question about handling text, find/replace scripts in Word, etc, and captioning format when this is done by pro video editor captioners.

Youtube has a ‘show transcript’ feature for viewers, that lets you open a small panel and view all the captions on videos that have captions, along with clickable timestamps.

We can copy and paste that youtube transcript into a document, which is great. Less great is that it takes a lot of work to make them human-readable as a document because there’s a timestamp every 8 words or so.

-If I want to turn the transcript into paragraphs, without the timestamps- Is there a way to automate find-replace such that you can just remove all the timestamps? They comprise a consistent range of time, all in the same format. Usually there aren’t colons or numbers in the transcript text itself. I use Google Docs but I’m willing to find another word processor if it’s better.

-Conversely, if I wanted to send the corrections back to a creator to re-upload on youtube, what’s the best format to use, so they can correct their auto-captions? I’ve read that some pro filmmakers want captions in a spreadsheet or table. If this was a TV interview rather than a youtube creator, what’s the best format for a future editor to work with our corrected English captions should they want to?

-I assume I'd want to keep timestamps intact if I'm going to pass this text back to the creator so they can keep track of where the corrections go.

When I copy/paste the transcript, I lose the live link that’s part of the timestamp on the Youtube 'show transcript' panel.

If I wanted to keep the timestamp and keep it's original link live, how can I keep from losing the the live link when pasting to my word processor or spreadsheet? Do I need to ‘view page source’ on the youtube transcript panel from my browser before copying, or is there another thing I’m missing?


Lastly, I’d love to learn more about online communities for transcription/translation/volunteer captioning folks too- some of this applies to captioning for accessibility etc too. I’ve learned a lot from tutorials on how transcription is done and there are a lot of cool tools out there.

Throwaway email if anyone’s doing something similar with Ukraine war video content and wants to compare notes: doe733456@gmail.com
posted by anonymous to Writing & Language (3 answers total) 2 users marked this as a favorite
 
I've found that Youtube does things a little differently than others, but they also have a lot of AI tools that will match up the timing for you.

I'd say just download what Youtube generated, run it through a grammar and spelling checker, edit out the non-sense and add in text where it's not super-obvious or obviously mistaken, reupload it, and play with the timing for each block of text if necessary. There should be an option where you let the AI automatically matchup each block of text again.

Unfortunately I don't speak Ukrainian so I am of little help here.
posted by kschang at 10:18 AM on October 21, 2022


Not sure how else you are getting transcripts besides copy/paste, but in case you don't know:
youtube-dl --write-auto-sub --skip-download [url]
should grab the auto-generated subtitles as a raw VTT file (or drop the "auto-" part for regular creator-added subs) and there are some switches to convert to other formats and grab only specific languages as well.

If you plan to give these back to creators to add to their videos, stripping the timestamps out is going to be counter-productive even though it makes the editing easier, since they'll have to be recreated to use. But if you wanted to do that for other human-readable purposes, it would probably easier to whip up a script that processes the text exactly the way you want, rather than trying to bend the find/replace in your word processor around it. I can help with that if you want (but probably won't see MeMail immediately).

Also, the auto-subs use the one-word-at-a-time captions like you'd get on a live broadcast, and that could be stripped down to regular screen-by-screen (removing the in-line timestamps on each word and just keeping the full text of each line).
posted by CyberSlug Labs at 1:03 PM on October 21, 2022 [1 favorite]


Lastly, I’d love to learn more about online communities for transcription/translation/volunteer captioning folks too- some of this applies to captioning for accessibility etc too. I’ve learned a lot from tutorials on how transcription is done and there are a lot of cool tools out there.

Amara came to mind.

MeFite simonw worked on a tool for capturing captions and transcripts from online videos and may be able to aid you further.
posted by brainwane at 6:32 PM on October 23, 2022 [1 favorite]


« Older How can I best support my bipolar and cognitively...   |   Placement of busbars on smartfilm Newer »
This thread is closed to new comments.