Windows PDF software that does OCR
April 9, 2021 10:41 AM   Subscribe

I am transitioning to a PC from over a decade in Mac world, and I need a PDF tool to replace Apple's Preview app. Specific needs are the ability to print a document to PDF (especially from web browsers and Microsoft Office), to print as a text PDF (my understanding is that Windows Print to PDF does not capture the text layer, it just prints as an image), and the ability to OCR PDFs that do not have a text layer. I'm willing to pay if needed, but I'm not willing to do a subscription.
posted by philosophygeek to Computers & Internet (10 answers total) 2 users marked this as a favorite
 
Take a look at Foxit Phantom PDF.
posted by briank at 11:15 AM on April 9, 2021


I'm not, like, an expert at Microsoft Office but my work's O365 just has "save to PDF". The PDFs are clean and searchable.

(For what it's worth, the print to PDF also turns out PDFs that are clean and searchable. On excel/word (i.e. an actual document) I see no difference between the two features. Print to PDF doesn't do well with web docs.)
posted by phunniemee at 11:27 AM on April 9, 2021 [3 favorites]


I encountered this issue in Microsoft Office - I'm not at my desktop now, but I remember there being a difference in the appearance of transparencies between "save to PDF" and "print to PDF". I believe the print option was more successful, but I was able to do it without employing Adobe or similar.
posted by Juniper Toast at 12:04 PM on April 9, 2021


I used the built-in "print to PDF" in Word. They look about the same as "save as PDF" but tends to be smaller and more consistent.

Yes, fully text searchable if the input is text itself.

I'm a little surprised that Word will also open and convert (most) PDFs into Word-style documents.

Whether it will open the PDF and convert to text depends on the internal formatting of the PDF, there isn't an OCR layer.

I've had some success saving a web page, opening in Word (converting to a docx format) and then printing again as a PDF. Depends highly on how complex the web layout is.
posted by porpoise at 12:12 PM on April 9, 2021


Above posts are correct.

1) For browsers, Chrome and Edge can save a web page as PDF by default. You don't need any additional software

2) Microsoft Office can automatically "Save as" PDF as well, for documents, worksheets, and presentations. Again no additional software necessary (and works really well - I have Adobe Acrobat Pro on my computer but almost never use it from within Office).

3) So, the only reason you might really need an additional software is for the OCR capabilities. For that I only know three options: (1) Adobe Acrobat, (2) PDF Architect Pro+OCR, (3) Foxit Phantom PDF. Of those, Foxit Phantom is the most affordable single purchase option. PDF Architect is subscription based. Adobe Acrobat has both versions, but the one-time price is outrageously expensive.
posted by tuxster at 12:13 PM on April 9, 2021


I've been surprised at the quality of masterpdfeditor's OCR. It's not free, but it's $70 for a lifetime license.
posted by eotvos at 12:37 PM on April 9, 2021


I've been using PDFill for years and it's always done good by me. Basically it creates a Windows Printer called PDFill and so you can print to paper or you can print there. That applies from your browser, Word, Excel, etc. The result is searchable.

You might want to give it a shot just as part of your evaluation process since as always YMMV.
posted by forthright at 2:07 PM on April 9, 2021


AbbyyReader is a very good OCR program. I know someone who has turned photocopies of old documents (typed) into machine readable text using this program.
posted by jb at 9:00 PM on April 9, 2021


I have similar needs and use PDFElement because I wanted a flat fee program, not an Adobe subscription. PDFElement is available for PC or Mac. It has good OCR capabilities and a lot of other useful PDF manipulation features, more than Preview. As others said above, I use print to PDF in Word for searchable PDFs. Save to PDF is less reliable.
posted by Red Desk at 10:14 PM on April 9, 2021


Tesseract is the industrial OCR solution you can put into data processing pipelines in AWS and other cloud providers -- but it's a well-solved problem such that you can run Tesseract on your phone. There's a list of free GUI's for it at Tesseract's hithub documentation.
posted by k3ninho at 12:14 AM on April 10, 2021


« Older Writer needs help make word email message to man...   |   Getting the most out of a house showing (as a... Newer »
This thread is closed to new comments.