Ease my Prezi pains
August 29, 2011 8:46 PM   Subscribe

I'm trying to turn PDFs made from presentations on the Prezi website (prezi.com) into text documents and am looking for a program to OCR them with. Since Prezi's PDFs come out rather odd with some text trailing off the edge that isn't pertinent to the current slide I need something that will allow me to make a selection box around the text I wish to OCR as opposed to auto OCR'ing the entire page. What Windows program should I be looking at?
posted by MeatyBean to Computers & Internet (7 answers total)
Nasty hack of the week: take a screenshot of the page and crop it?
posted by katrielalex at 1:32 AM on August 30, 2011

1: Select the text you want within Adobe reader, one page at the time.
2: Cut and paste the text (now a TIFF picture file) to a Word document or some other file format you can use for OCR.
3: When finished, OCR this document.

Google and others (one review) offer online OCR.

Another option is to go back to the original Prezi document, edit this to avoid unnecessary text, and then make a new PDF.
posted by iviken at 2:27 AM on August 30, 2011

Find someone with Acrobat Pro and ask them to run the built-in OCR function.
posted by Thorzdad at 4:54 AM on August 30, 2011

I use ScanTailor for this sort of cropping and general cleanup. It's not perfect, but it's pretty decent, and automates much of the process.
posted by nebulawindphone at 5:00 AM on August 30, 2011

@nebulawindphone: ScanTailor seems to be useful for raw image files, all I have available is the PDF file, which I don't think I can use it for.
posted by MeatyBean at 4:05 PM on August 30, 2011

Right, no, but there are programs that will convert a PDF to a series of TIFFs, and then you can run ScanTailor on the TIFFs. Might be overkill for what you're doing, though.
posted by nebulawindphone at 1:50 PM on August 31, 2011

Well I have managed to ORC some of the PDFs with Adobe Acrobat Pro and then save as a Word document where I am able to edit them into something useful. The trouble I have now however is getting Acrobat to recognize white text on black backgrounds. In the OCR'd PDFs with non-black backgrounds I can highlight text with the selection tool, but when I OCR a PDF with a black background it produces no noticeable change.

Is this an issue with Acrobat itself or am I doing something wrong?
posted by MeatyBean at 12:45 AM on September 1, 2011

« Older Meth Lab Blues   |   Why did I find bread on my doorstep? Newer »
This thread is closed to new comments.