best transcribing software
April 13, 2010 4:49 PM Subscribe
I have to transcribe some chemistry notes. The text isn't too bad to read for a human, but I don't think that OCR would be much of use here. There are few arrows and lines. I guess I would have to transcribe it manually, but is there some software that can help me, even partially?
Would reading it to speech recognition software be helpful? Microsoft offers decent speech recognition. Dragon Naturally Speaking from Nuance is generally recognized as one of the best speech to text programs.
posted by X4ster at 5:00 PM on April 13, 2010
posted by X4ster at 5:00 PM on April 13, 2010
Response by poster: Actually this is a job I took so I'm the grad student here:)
posted by leigh1 at 5:01 PM on April 13, 2010
posted by leigh1 at 5:01 PM on April 13, 2010
Response by poster: Speech recognition could be partially helpful. But I would be having trouble pronouncing "glucocorticoids" etc.
posted by leigh1 at 5:06 PM on April 13, 2010
posted by leigh1 at 5:06 PM on April 13, 2010
Are you using Mac or PC? What kind of help do you hope to get from the software? You say it's beyond OCR, so do you just want some software that can, say, split the window between the scanned image, and a text document for transcribing? Scrivener for Mac is perfect for that. If you're using a PC, well, then I have no idea what to recommend.
posted by ferdinandcc at 5:25 PM on April 13, 2010
posted by ferdinandcc at 5:25 PM on April 13, 2010
Response by poster: I'm on PC, but thanks for the recommendation anyway.
posted by leigh1 at 5:29 PM on April 13, 2010
posted by leigh1 at 5:29 PM on April 13, 2010
Best answer: Agreed with ferdinandcc that a split-screen approach would help.
How un-OCRable is the text? If it's just patchy (one paragraph that will OCR fine, followed by seven lines of equations with awkward symbols, followed by two lines of OCRable text, followed by a diagram, etc.), then one of the proprietary OCR programs would still be a good idea. OmniPage will split the screen for you, displaying the original image on one side and the editable OCRed text on the other, and will map the cursor position onto the image so that you can see which part of the page you're working on. It's also trainable, so you can I've also used Kurzweil 3000 for accessibility work in the past, specifically with science textbooks; it's fiddly but did the job okay.
If the text itself seems like it'd be too poor quality to OCR, there are still some tricks you can do to make it more 'readable'. Unpaper is really handy (and free!); we use it in combination with ImageMagick (also open-source). Anything that straightens the alignment of the text, removes gutter shadow (which OCR will try to read), and makes the page background as white as possible while keeping the text dark will improve the OCR accuracy rate.
If you end up having to do it all manually, good luck, my friend, and embrace the split-screen wherever possible. My approach to this (when transcribing handwritten material that laughs in the face of OCR) is to have the image on the top half of the screen and a text editor on the bottom half - I use TextWrangler for this because it's simple, unfussy, and I need to add XML tags. If you have two monitors at your disposal and can touch-type, that works even better.
Finally - transcription work can be boring as hell, so don't underestimate the power of having something good to listen to! I've gone through the entire Savage Love podcast archive, everything interesting I could find on BBC Radio's iPlayer, and am currently working through the This American Life library.
Good luck! Feel free to MeMail me if I can be of any help.
posted by Catseye at 2:41 AM on April 14, 2010 [1 favorite]
How un-OCRable is the text? If it's just patchy (one paragraph that will OCR fine, followed by seven lines of equations with awkward symbols, followed by two lines of OCRable text, followed by a diagram, etc.), then one of the proprietary OCR programs would still be a good idea. OmniPage will split the screen for you, displaying the original image on one side and the editable OCRed text on the other, and will map the cursor position onto the image so that you can see which part of the page you're working on. It's also trainable, so you can I've also used Kurzweil 3000 for accessibility work in the past, specifically with science textbooks; it's fiddly but did the job okay.
If the text itself seems like it'd be too poor quality to OCR, there are still some tricks you can do to make it more 'readable'. Unpaper is really handy (and free!); we use it in combination with ImageMagick (also open-source). Anything that straightens the alignment of the text, removes gutter shadow (which OCR will try to read), and makes the page background as white as possible while keeping the text dark will improve the OCR accuracy rate.
If you end up having to do it all manually, good luck, my friend, and embrace the split-screen wherever possible. My approach to this (when transcribing handwritten material that laughs in the face of OCR) is to have the image on the top half of the screen and a text editor on the bottom half - I use TextWrangler for this because it's simple, unfussy, and I need to add XML tags. If you have two monitors at your disposal and can touch-type, that works even better.
Finally - transcription work can be boring as hell, so don't underestimate the power of having something good to listen to! I've gone through the entire Savage Love podcast archive, everything interesting I could find on BBC Radio's iPlayer, and am currently working through the This American Life library.
Good luck! Feel free to MeMail me if I can be of any help.
posted by Catseye at 2:41 AM on April 14, 2010 [1 favorite]
Er, garbled sentence towards the end of that first paragraph should read "it's also trainable, so you can program it to recognise some symbols with moderate success. I've also used..."
posted by Catseye at 2:42 AM on April 14, 2010
posted by Catseye at 2:42 AM on April 14, 2010
This thread is closed to new comments.
posted by JJ86 at 4:59 PM on April 13, 2010