What is the best software to repair a corrupt PDF?
May 29, 2017 11:43 AM Subscribe
A friend is editing a book and has commented on the PDF. The file is now corrupt and Acrobat won't open it. There's a ton of different software solutions out there but I don't know which one would work best. Paid solutions are acceptable if it is reasonable and if it works.
If you're using Windows, there's a reasonable chance that Previous Versions saved a version of the file at some point. Sorry, no specific advice on repairing the corruption. Just make sure to test on copies of the file!
posted by cnc at 11:57 AM on May 29, 2017
posted by cnc at 11:57 AM on May 29, 2017
Response by poster: Windows 10 but the file was created on a Mac OS X.
Acrobat, Google Docs, and Foxit won't open it. It's a 300 meg file with lots of images so that may complicate things.
posted by clockworkjoe at 12:01 PM on May 29, 2017
Acrobat, Google Docs, and Foxit won't open it. It's a 300 meg file with lots of images so that may complicate things.
posted by clockworkjoe at 12:01 PM on May 29, 2017
I've had files I thought were corrupted (unopenable locally after retrieving from cloud storage) magically work again by saving a copy of the file, but renaming it, typing the .pdf extension manually.
posted by cotton dress sock at 12:20 PM on May 29, 2017 [1 favorite]
posted by cotton dress sock at 12:20 PM on May 29, 2017 [1 favorite]
You can try ghostscript (open source program).
I use ghostscript to clean up PDFs in other ways (such as to make text selectable when the author, for some reason, made the document where the text is not selectable, which makes no sense with engineering datasheets where you need to copy/paste the part pinout so you can use the part in your designs).
posted by eye of newt at 1:22 PM on May 29, 2017 [3 favorites]
I use ghostscript to clean up PDFs in other ways (such as to make text selectable when the author, for some reason, made the document where the text is not selectable, which makes no sense with engineering datasheets where you need to copy/paste the part pinout so you can use the part in your designs).
posted by eye of newt at 1:22 PM on May 29, 2017 [3 favorites]
eye of newt's suggestion about using ghostscript is really good.
What exactly do you want out of the recovery? A perfect PDF file? All the text/comments? All the images?
I don't suppose this is the kind of thing you could post online for us to look at?
posted by gregr at 1:26 PM on May 29, 2017
What exactly do you want out of the recovery? A perfect PDF file? All the text/comments? All the images?
I don't suppose this is the kind of thing you could post online for us to look at?
posted by gregr at 1:26 PM on May 29, 2017
You might be able to use pdftk to salvage parts of the document.
Looks like you're in for some pain though, PDF is notoriously complicated.
Since your friend is editing the book, I'm assuming you can get your hands on a clean copy; I would try extracting the annotations from your corrupt PDF, it's a simpler problem than repairing the entire file. Try searching for "extract annotations PDF", a bunch of tools come up.
posted by Dr Dracator at 1:31 PM on May 29, 2017
Looks like you're in for some pain though, PDF is notoriously complicated.
Since your friend is editing the book, I'm assuming you can get your hands on a clean copy; I would try extracting the annotations from your corrupt PDF, it's a simpler problem than repairing the entire file. Try searching for "extract annotations PDF", a bunch of tools come up.
posted by Dr Dracator at 1:31 PM on May 29, 2017
Two important questions here:
what incident (if any) preceded your inability to access this PDF file?
and
secondly, do you have access to the original media on which the PDF was stored?
If the answer to the latter is yes, I strongly suggest you run "PhotoRec" on it. Don't be deceived the by name. It can recover PDF file types, often with success rates exceeding that of commercial repair or recovery software.
posted by jacobean at 1:40 PM on May 29, 2017 [1 favorite]
what incident (if any) preceded your inability to access this PDF file?
and
secondly, do you have access to the original media on which the PDF was stored?
If the answer to the latter is yes, I strongly suggest you run "PhotoRec" on it. Don't be deceived the by name. It can recover PDF file types, often with success rates exceeding that of commercial repair or recovery software.
posted by jacobean at 1:40 PM on May 29, 2017 [1 favorite]
Response by poster: It's a book in layout for commercial publication, so no dice on uploading it. I have no idea what caused the corruption. I am troubleshooting at a different computer at a remote site. However, I was just contacted by my friend, who said he just rewrote his comments and saved to a new PDF, which worked.
If it gets corrupted again, I know what to do.
Thanks for the advice!
posted by clockworkjoe at 2:01 PM on May 29, 2017
If it gets corrupted again, I know what to do.
Thanks for the advice!
posted by clockworkjoe at 2:01 PM on May 29, 2017
ghostview / ghostscript, and some of the command line tools that go with it are really nice.
http://stefaanlippens.net/pdf2ps_vs_pdftops/
You should be able to cut the document apart page by page and recover what you can. There might be one page, and if you know someone with postscript knowledge, the can extract the underlying elements.
posted by nickggully at 6:23 PM on May 29, 2017
http://stefaanlippens.net/pdf2ps_vs_pdftops/
You should be able to cut the document apart page by page and recover what you can. There might be one page, and if you know someone with postscript knowledge, the can extract the underlying elements.
posted by nickggully at 6:23 PM on May 29, 2017
→ I have no idea what caused the corruption
Acrobat is pretty good at causing corruption on its own, so I wouldn't worry about it.
Seconding pdftk, and for very low level work, qpdf. Qpdf is particularly good at stitching up gaps in a PDF's object stream, which in my experience is causes the majority of non-hardware corruption in Acrobat documents.
posted by scruss at 5:18 AM on May 30, 2017
Acrobat is pretty good at causing corruption on its own, so I wouldn't worry about it.
Seconding pdftk, and for very low level work, qpdf. Qpdf is particularly good at stitching up gaps in a PDF's object stream, which in my experience is causes the majority of non-hardware corruption in Acrobat documents.
posted by scruss at 5:18 AM on May 30, 2017
This thread is closed to new comments.
posted by a humble nudibranch at 11:53 AM on May 29, 2017 [1 favorite]