July 2012: What are some good audio transcription tools?
July 14, 2012 8:26 PM   Subscribe

Wondering about the current state of more-or-less free VOICE RECOGNITION software and websites. Anywhere you can upload an MP3 to get a transcription?

Since this AskMeFi post was written, there has been two years worth of progress in iPhone development, SIRI, Android development, Google Voice (Grand Central) accounts, Windows 7, Mac OS Mountain Lion API, etcetera.

What I would really like to do is upload an MP3 of spoken-word audio and get a good-enough transcription. I could call my own Google Voice phone number and leave myself voicemail messages in three-minute increments, but that seems quite labor-intensive.

Probably I should just buy the cheap Dragon Dictate NaturallySpeaking Basic retail box for $40, but I thought I would ask here about my options for getting some audio transcribed.
posted by shipbreaker to Technology (12 answers total) 9 users marked this as a favorite
Response by poster: https://www.speechpad.com/ one dollar per minute !??
posted by shipbreaker at 8:43 PM on July 14, 2012

Related: Google speech recognition built into Chrome 11, an example of it in action. Microsoft's speech API. Speech API web site.

None of these does exactly what you're asking for--though some of them possible could if someone did a bit of programming to create a front end.
posted by flug at 8:45 PM on July 14, 2012

What's your ideal accuracy/speed/cheapness trade off? Really, you have to pick two of the three to get any in decent quantity. Amazon's Mechanical Turk would be high on accuracy, but a little more expensive and not as fast as automatic transcription.
posted by supercres at 8:48 PM on July 14, 2012

Also, is it existing spoken word audio or is it your own voice that is yet-to-be-recorded? If its the latter, you can speak really clearly, which will improve accuracy, but if it's like a podcast, you're not going to get great results from an untrained or generally-trained classifier.
posted by supercres at 8:51 PM on July 14, 2012

Response by poster: supercres, in true Internet spirit, I will accept any quality, as long as it's free.

flug, I wonder why there aren't handfuls of little Web 3.0 startups that do just that --- web-based front end, use existing free APIs on back end to do cheep transcriptions.
posted by shipbreaker at 8:53 PM on July 14, 2012

Here is a frontend for the Windows api (Vista/Win7) that adds some capabilities, like transcribing WAV files. It cost $25, though.

Another possibility is Dragon Naturally Speaking: Dragon NaturallySpeaking for your PC can recognize .wav, .wma, .dss, ds2 and .mp3 files. MacSpeech Scribe for your Mac can transcribe aif, aiff, m4a, m4v, mp4 and wav files. $99 and up.

Neither of those are free, but at least they do what you want.
posted by flug at 9:09 PM on July 14, 2012 [1 favorite]

supercres, in true Internet spirit, I will accept any quality, as long as it's free.

You say that now, but wait till you see the results.

At work we use MS Lync/Exchange for phone and voicemail. You get an automatic translation in your inbox of all voicemails but the results are often unintelligible. We had a good laugh about it the first week, but now it's just pathetic.

This is still a hard problem to solve. That's why there aren't a bunch of free/cheap options out there.
posted by sbutler at 9:32 PM on July 14, 2012

You don't say if the speech is your speech or other peoples.

Much (all ?) voice recognition processing benefits from training. The results may be rubbish to start with but if you take the time to point out what you were really saying to the software the results will improve (this is assuming that other variables such as recording quality/extraneous noise remain the same).

I've never used it but I believe that the Dragon software allows for this and why attempts to transcribe voice mail cannot do (because they're dealing with an arbitary caller for which they have no training). As sbutler says - this is hard stuff.

Hmm, just realised that most of what I've said here was covered by a reply to your cited post. They also make some comments about the versions of Dragon software and the quality of the microphone to be used which are worth considering.
posted by southof40 at 10:23 PM on July 14, 2012

Response by poster: I'm going with WCityMike's advice: Swype Beta lets me access Dragon Dictate translation in one-minute increments, which is free, accurate, and better than nothing.

Would still be nice to just upload an MP3 and have a computer do all the work ; next time.
posted by shipbreaker at 11:03 PM on July 14, 2012

1. Convert it to a video
2. Upload to YouTube
3. Let YouTube auto generate the closed captions
4. Download captions with one of the many apps out there. (eg:http://google2srt.sourceforge.net/en/)
5. Profit!
posted by blue_beetle at 6:28 AM on July 15, 2012 [3 favorites]

Damn that's good, blue_beetle! You rock!
posted by dancestoblue at 5:56 PM on July 15, 2012

I last used Dragon Dictate to transcribe an MP3 in December 2011. I hope it has improved since then because the result was woeful.
posted by unliteral at 11:00 PM on July 15, 2012

« Older Who wrote the old book of poems for kids, "It May...   |   intimate human connections Newer »
This thread is closed to new comments.