Join 3,363 readers in helping fund MetaFilter (Hide)


Readability for PDFs?
October 19, 2011 11:27 AM   Subscribe

I have to read a lot of academic papers. They are usually formatted into 2 columns per page. This is fine if I print them out, but when I read them on a computer screen, it is difficult/annoying for me to scroll up and down - I lose my place easily (I recognize this may be unusual). Is there software out there that will strip the text from a pdf into one column (and maybe put all the figures at the end or somewhere)? Like the Readability extension for text on the web, but for pdfs. Maybe a plugin for Adobe Acrobat? Linux or Windows, please. Thanks.
posted by bluefly to Computers & Internet (6 answers total) 22 users marked this as a favorite
 
Various OCR programs will do this, like Omnipage - you "scan" the PDF in and save it as simplified plain text or formatted text. My company's product ClaroRead Plus will do it, for example - 15-day demo version. Omnipage from Nuance is probably available for less money - or nothing if you've purchased a scanner. There are lots of others.

Access PDF is a free program, erm, also by me. It's designed for screenreader users (blind people usually) so it dumps a PDF out as a linear text file that you could read in whatever. Only works with "accessible" PDF files containing text (rather than pictures of text.) You can get it in the WebbIE suite of applications (during installation, choose not to install the bits you don't want.) It's just a front end to the XPDF set of commandline utilities, though, if you want to go straight to them: XPDF

Finally, if the PDF is decently-formatted, get Adobe Reader X. Open your PDF file. View menu, Zoom, select Reflow (or press Ctrl+4). This will (for some subset of all PDF files) do exactly what you want: turn the whole thing into a linear column of text. There are LOADS of options like this in Reader but they're very much dependent on the quality of the PDF file. I wrote a detailed guide here: PDF and Accessibility and Chapter 10 of the Adobe Acrobat Manual.
posted by alasdair at 1:01 PM on October 19, 2011 [3 favorites]


This is a problem people have with 2-column PDFs when read on small screen devices, so look into solutions for those: This looks it could work for you. Depending on how techy you are, you could probably use tools like pdftk or a roll-your-own parser to extract the text or otherwise manipulate the document.
posted by axiom at 2:03 PM on October 19, 2011


I use Mutt, a text-based e-mail reader for Unix systems, and it shows me PDFs all the time. I think it does it using pdftotext which apparently comes with xpdf. Apparently the default is to strip out the layout and show the PDF in "reading order," which is exactly what you want.
posted by massysett at 2:40 PM on October 19, 2011 [1 favorite]


A low tech solution if you have the right monitor is to turn it 90 degrees.
posted by hanoixan at 9:28 PM on October 19, 2011


Similar previously.
posted by anaelith at 1:51 AM on October 20, 2011


I use this to format 2 column scientific papers for my kindle, but there are options to change the output size to whatever you want it to be. I've found it to work pretty well.
posted by pwicks at 8:17 AM on October 20, 2011


« Older As the green thumb of my circl...   |  I have a week off, during whic... Newer »
This thread is closed to new comments.