Join 3,375 readers in helping fund MetaFilter (Hide)


Excel spreadsheet to pretty clips report?
January 31, 2013 3:18 PM   Subscribe

I have an Excel spreadsheet with couple hundred rows: URLs, a few fields of identifying information (outlet, etc.), for media hits on the organization I work for. I need to make this into a pretty (or at least semi-legible to the computer illiterate) document that I can give to board members, senior staff, etc. What I would like is to use some form of scripting to automate this. Bonus points if the script can somehow open the URL and take a screenshot of the story so that I don't have to. I have a Mac, and both the Adobe and Microsoft suites. How do I not spend tedious hours manually doing this? Especially since I have to do it each month and quarter.
posted by klangklangston to Computers & Internet (8 answers total) 4 users marked this as a favorite
 
For the screenshots, you could use CutyCapt or PhantomJS. The problem lies in getting the screenshot to show the relevant information legibly. What you might do is load the whole page in PhantomJS, use jQuery to pluck quotes out of the document (some window of x words or characters from around any keywords of interest) and output both the text and a rasterized thumbnail.

If the text is more interesting than the screenshots, you have a lot more options.
posted by Monsieur Caution at 3:47 PM on January 31, 2013


I haven't used phantom.js enough to know its limitations, but perhaps it can do the URL->image part.
posted by fleacircus at 3:51 PM on January 31, 2013


The photo doesn't have to be all that legible; the URL will link to the actual article, so if people want to see the full text, they can. The spreadsheet I've already got includes five words on either side of our org's name as one of the columns.
posted by klangklangston at 3:52 PM on January 31, 2013


You can do a pretty nice AppleScript/Python script for this.

AppleScript opens up the Excel file, converts it to a CSV, sends a shell command to run a python script with an argument telling Python the name of your CSV.

Python ingests the CSV, iterates over each row, opening each URL in a browser, take a screen shot, save that image to a file, and then constructs an HTML file that includes both your screen capture and the relevant text from your spreadsheet.
posted by ifandonlyif at 4:40 PM on January 31, 2013


Sounds pretty doable then. I'm on a phone and so unable to verify, but it looks like you can just install Homebrew, run 'brew install qt cuty_capt' to get Qt and CutyCapt already built for OS X (probably), and use the programming language of your choice to read in the CSV, exec CutyCapt with appropriate command line options once per row, and assemble the report.

Using an actual browser seems pretty unnecessary. Qt (which CutyCapt uses) and PhantomJS both have rendering engines that browsers also use, but they're more programmable.
posted by Monsieur Caution at 4:44 PM on January 31, 2013


Incidentally, for the pretty document part of the question, any programming language you're likely to choose for this will have a PDF generation library, allowing you to program some reusable layout incorporating graphic assets you hand craft once. For Python, look at ReportLab.
posted by Monsieur Caution at 5:04 PM on January 31, 2013


As Monsieur Caution points out, avoiding an actual browser is definitely best. But if your work computer is locked down and you can't easily install packages, it's still doable with nothing more than what comes standard with OSX.
posted by ifandonlyif at 5:20 PM on January 31, 2013


Heh. Sounds like it's time to learn Python.

(The extent of my scripting knowledge so far is some Office macros, and some Applescript, as well as a couple chron jobs I set up to grab website updates back when I was college.)
posted by klangklangston at 5:21 PM on January 31, 2013


« Older Help me find the xkcd comic ab...   |  A few questions about doing a ... Newer »
This thread is closed to new comments.