best transcribing software
April 13, 2010 4:49 PM   Subscribe

I have to transcribe some chemistry notes. The text isn't too bad to read for a human, but I don't think that OCR would be much of use here. There are few arrows and lines. I guess I would have to transcribe it manually, but is there some software that can help me, even partially?
posted by leigh1 to Computers & Internet (8 answers total) 1 user marked this as a favorite
 
A friend wrote an OCR software to digitize electrical plans with specialized symbology for utility company. It was a unique project with no off-the-shelf product which would do the task and he made a good chunk of change working on the project as a consultant. Anything specialized would require software to be developed which would be costly unless you are a gifted programmer/developer and want to do a labor of love. In your case grad students would be cheaper. Much cheaper.
posted by JJ86 at 4:59 PM on April 13, 2010


Would reading it to speech recognition software be helpful? Microsoft offers decent speech recognition. Dragon Naturally Speaking from Nuance is generally recognized as one of the best speech to text programs.
posted by X4ster at 5:00 PM on April 13, 2010


Response by poster: Actually this is a job I took so I'm the grad student here:)
posted by leigh1 at 5:01 PM on April 13, 2010


Response by poster: Speech recognition could be partially helpful. But I would be having trouble pronouncing "glucocorticoids" etc.
posted by leigh1 at 5:06 PM on April 13, 2010


Are you using Mac or PC? What kind of help do you hope to get from the software? You say it's beyond OCR, so do you just want some software that can, say, split the window between the scanned image, and a text document for transcribing? Scrivener for Mac is perfect for that. If you're using a PC, well, then I have no idea what to recommend.
posted by ferdinandcc at 5:25 PM on April 13, 2010


Response by poster: I'm on PC, but thanks for the recommendation anyway.
posted by leigh1 at 5:29 PM on April 13, 2010


Best answer: Agreed with ferdinandcc that a split-screen approach would help.

How un-OCRable is the text? If it's just patchy (one paragraph that will OCR fine, followed by seven lines of equations with awkward symbols, followed by two lines of OCRable text, followed by a diagram, etc.), then one of the proprietary OCR programs would still be a good idea. OmniPage will split the screen for you, displaying the original image on one side and the editable OCRed text on the other, and will map the cursor position onto the image so that you can see which part of the page you're working on. It's also trainable, so you can I've also used Kurzweil 3000 for accessibility work in the past, specifically with science textbooks; it's fiddly but did the job okay.

If the text itself seems like it'd be too poor quality to OCR, there are still some tricks you can do to make it more 'readable'. Unpaper is really handy (and free!); we use it in combination with ImageMagick (also open-source). Anything that straightens the alignment of the text, removes gutter shadow (which OCR will try to read), and makes the page background as white as possible while keeping the text dark will improve the OCR accuracy rate.

If you end up having to do it all manually, good luck, my friend, and embrace the split-screen wherever possible. My approach to this (when transcribing handwritten material that laughs in the face of OCR) is to have the image on the top half of the screen and a text editor on the bottom half - I use TextWrangler for this because it's simple, unfussy, and I need to add XML tags. If you have two monitors at your disposal and can touch-type, that works even better.

Finally - transcription work can be boring as hell, so don't underestimate the power of having something good to listen to! I've gone through the entire Savage Love podcast archive, everything interesting I could find on BBC Radio's iPlayer, and am currently working through the This American Life library.

Good luck! Feel free to MeMail me if I can be of any help.
posted by Catseye at 2:41 AM on April 14, 2010 [1 favorite]


Er, garbled sentence towards the end of that first paragraph should read "it's also trainable, so you can program it to recognise some symbols with moderate success. I've also used..."
posted by Catseye at 2:42 AM on April 14, 2010


« Older Questions on filing tax amendment   |   Help make Aladdin special Newer »
This thread is closed to new comments.