Scraping & saving?
November 19, 2009 3:13 PM
Subscribe
I have perhaps a thousand delicious links (to documents in the SEC database).
All of these could be broken at anytime if the SEC changes the way it displays these.
How do I automate the process of copying the contents of those documents so I can save them in a database?
I have checked into previous questions and web scraping software, but the web scraping/crawling/spidering software out there requires what looks a little to much to me like programming.
Is there an easy way to collect the documents? I am hoping to feed something the list of links and be done. Fair warning: if I can figure this out, then I will ask how best to save the documents in a database. I have considered using Mechanical Turk, or something, but I think this really ought to be a job for a machine. Free software solutions preferred, but willing to pay to make it easy for me to do...
Sample document:
posted by extropy to computers & internet (12 comments total)
4 users marked this as a favorite
posted by extropy at 3:15 PM on November 19, 2009