Best way to convert digital audio files to digital text?
November 6, 2010 9:32 PM Subscribe
Best way to convert digital audio files to digital text?
There are some mp3s of a talk radio show that I would love to convert to searchable text (with permission). But I don't really know much about that sort of software or what the best methods are, or if it can be done accurately at all. I've seen software for dictation mentioned on AskMe before, but I don't need that - just the ability to convert audio files to text. I'm on a Mac. Thanks!
There are some mp3s of a talk radio show that I would love to convert to searchable text (with permission). But I don't really know much about that sort of software or what the best methods are, or if it can be done accurately at all. I've seen software for dictation mentioned on AskMe before, but I don't need that - just the ability to convert audio files to text. I'm on a Mac. Thanks!
And without a whole lot of training the software, excellent source audio with no noise or cross-talk, and a lot of follow up editing, what you want to do is basically still impossible.
posted by fourcheesemac at 10:25 PM on November 6, 2010 [1 favorite]
posted by fourcheesemac at 10:25 PM on November 6, 2010 [1 favorite]
You might be able to order transcripts of the show, or else it will end up being cheaper to pay someone to transcribe them.
posted by fourcheesemac at 10:26 PM on November 6, 2010
posted by fourcheesemac at 10:26 PM on November 6, 2010
Response by poster: And without a whole lot of training the software, excellent source audio with no noise or cross-talk, and a lot of follow up editing, what you want to do is basically still impossible.
I don't mind doing some editing. I don't think there's a lot of cross-talk.
You might be able to order transcripts of the show, or else it will end up being cheaper to pay someone to transcribe them.
It's a political show from a small station in California. I don't think transcripts are already available, but I wanted to offer setting up a blog that offered transcripts.
posted by critzer at 10:31 PM on November 6, 2010
I don't mind doing some editing. I don't think there's a lot of cross-talk.
You might be able to order transcripts of the show, or else it will end up being cheaper to pay someone to transcribe them.
It's a political show from a small station in California. I don't think transcripts are already available, but I wanted to offer setting up a blog that offered transcripts.
posted by critzer at 10:31 PM on November 6, 2010
Oh, if only it were that easy. Put it this way: even major broadcasters, which could potentially make their announcers train speech-to-text programs, have yet to go down the speech recognition route. They still use real human beings to type up their transcripts, because for this purpose, speech recognition software is still more trouble than it's worth. You don't necessarily need dictation software, but a decent foot pedal will help you pump out the transcripts faster. Happy typing.
posted by embrangled at 10:54 PM on November 6, 2010
posted by embrangled at 10:54 PM on November 6, 2010
Best answer: If you're willing to spend a couple of dollars (literally, probably not more than $10-20), this seems like a great use of mechanical turk. Split the audio file into short chunks, pay a few cents a minute, and I think you'll get it done quickly.
Here's Andy Baio's article on getting a 36-minute interview transcribed for about $15. He used 5-minute chunks. If you broke your file into smaller chunks, you could probably pay less.
posted by griseus at 1:54 AM on November 7, 2010 [1 favorite]
Here's Andy Baio's article on getting a 36-minute interview transcribed for about $15. He used 5-minute chunks. If you broke your file into smaller chunks, you could probably pay less.
posted by griseus at 1:54 AM on November 7, 2010 [1 favorite]
Yeah, Mechanical Turk (a name I find bigoted so I hate writing it) is going to be a lot better here.
People seem to think this should be a basic, easy function for software. It really isn't. We're at a point where -- with a limited vocabulary, a fair bit of training, and nice slow speaking by one person into an excellent quality microphone - one can get 90+ percent accuracy with dictation.
We are years from this capacity for natural discourse.
posted by fourcheesemac at 5:10 AM on November 7, 2010
People seem to think this should be a basic, easy function for software. It really isn't. We're at a point where -- with a limited vocabulary, a fair bit of training, and nice slow speaking by one person into an excellent quality microphone - one can get 90+ percent accuracy with dictation.
We are years from this capacity for natural discourse.
posted by fourcheesemac at 5:10 AM on November 7, 2010
Taking down and transcribing what people say is what makes my living.
Turning the spoken word into text takes time, experience, skill, and patience. This is one product where you really do get what you pay for. You can't expect someone to sit for hours producing a transcript (and it can and does take hours) and pay them next to nothing.
Go to a real transcription agency or a court reporter and ask them to do it. They might even do student rates.
posted by stenoboy at 9:03 AM on November 7, 2010
Turning the spoken word into text takes time, experience, skill, and patience. This is one product where you really do get what you pay for. You can't expect someone to sit for hours producing a transcript (and it can and does take hours) and pay them next to nothing.
Go to a real transcription agency or a court reporter and ask them to do it. They might even do student rates.
posted by stenoboy at 9:03 AM on November 7, 2010
This thread is closed to new comments.
Actually, you pretty much do. The processing required is the same whether the source is a microphone or a recording.
Nuance's Dragon Naturally Speaking is the best speech recognition engine I know of, and Nuance has assorted Mac products incorporating that engine.
posted by flabdablet at 9:57 PM on November 6, 2010