Before building my own... does this software exist? I need to search a collection of PDF & Word files for key phrases, and dump the surrounding lines (x-m to x+n characters/lines, where x is the found phrase) into text files. I occasionally need to search a few dozen files for a few dozen data items, which usually have some identifying text nearby. This needs to be automated. Big bonus if it implements OCR, but that's not essential. Freeware, or cheapware, obviously is best. Windows-based is preferable, but I can do Linux.
When I copy/paste text from this PDF, the copied text is repeated and sometimes includes additional text. Why is this and how can I fix it? [more inside]
perl or sed or awk help for newbie [more inside]
I have to read a lot of academic papers. They are usually formatted into 2 columns per page. This is fine if I print them out, but when I read them on a computer screen, it is difficult/annoying for me to scroll up and down - I lose my place easily (I recognize this may be unusual). Is there software out there that will strip the text from a pdf into one column (and maybe put all the figures at the end or somewhere)? Like the Readability extension for text on the web, but for pdfs. Maybe a plugin for Adobe Acrobat? Linux or Windows, please. Thanks.
Help me extract text from a PDF of a Powerpoint presentation! [more inside]
Should I have any longevity concerns about .epub files? [more inside]
What are good apps for converting PDFs and other text files to non-PDF text files? [more inside]
I'd like to extract the text and images from a multi-page pdf to use on the web. [more inside]
What are the alternatives to pandoc? I'm looking for tools that will allow me to maintain a large document in a simple plain text format such as markdown and compile it to PDF and HTML. [more inside]
How can I import PDF files as editable text in Microsoft OneNote? [more inside]
I am in need of a server-side Linux or Unix-based software solution that will sort uploaded PDF files that can be PDF-native (that is, created in such a way that the text in the PDF is freely copyable), PDFs with embedded text over images (usually the result of a previous OCR job), and PDF-scanned, which are PDFs containing no text, only scanned images. The PDF-native files and PDFs with embedded text it will extract text from, the PDF-scanned files it will then OCR and export that text. [more inside]
Are there any software packages or toolkits (preferably open source) available that allow me to automatically extract graphical content (such as pictures, diagrams, graphs, etc.) from batches of PDFs? [more inside]
How can I convert a Dynatext book (with SGML exporting capabilities) to something that can be viewed on a web server ? [more inside]
How can I find real Java software for a Motorola SLVR L7? I've only ever seen stupid games. I need a PDF reader, a text editor, and a French-English dictionary, or else dictd and a dict client. [more inside]
I'm attempting to transcribe some scanned documents. The quality of the .pdf files is low. Is there a way to play around with the images in order to help me make out the mangled words? [more inside]
I want to limit the number of lines that users can input in a printable pdf form text field so that there is no overflow/scroll bars or auto-resizing of the text. How can i do that? [more inside]
I need to convert a scanned pdf to searchable text, without printing it out and scanning it back in using OCR. Also, I'd like a cheap or free solution since I'm not likely to use it
often ever again.