Newspaper Clippings 2.0?
May 7, 2009 3:36 PM Subscribe
Can I automate archiving/saving news articles on a certain topic I pull from google new's rss feed?
For a while, I was manually copying and saving all news articles from google on a certain topic I got in my rss feed. But it became cumbersome so I stopped. Now I've looked back at those articles from a few years ago, and wish I had kept up. Is there a way to automate something like that? A modern day newspaper clipping collection only automated? I don't want to save just the url, but the actual text of the article, where it is from, date, and possibly pictures.
This is for my own personal use, so I doubt it would fall under any copyright issues (I would assume).
I did a search but my google fu is failing me. I keep coming up on the google news archive, but thats not really what I'm looking for. I want my own personal copies. I don't now how google news archive works, but I know that some articles I had gotten from google news originally are not in their archive (I just checked.).
For a while, I was manually copying and saving all news articles from google on a certain topic I got in my rss feed. But it became cumbersome so I stopped. Now I've looked back at those articles from a few years ago, and wish I had kept up. Is there a way to automate something like that? A modern day newspaper clipping collection only automated? I don't want to save just the url, but the actual text of the article, where it is from, date, and possibly pictures.
This is for my own personal use, so I doubt it would fall under any copyright issues (I would assume).
I did a search but my google fu is failing me. I keep coming up on the google news archive, but thats not really what I'm looking for. I want my own personal copies. I don't now how google news archive works, but I know that some articles I had gotten from google news originally are not in their archive (I just checked.).
Response by poster: No. Google alert is just the start of it. That's more how I was doing it before, but then I'd manually save the stories to a mysql database and output them in a list. That doesn't have to be the output, but some what of archiving in flat files or a db as my own copy is what I'm looking for.
posted by [insert clever name here] at 4:45 PM on May 7, 2009
posted by [insert clever name here] at 4:45 PM on May 7, 2009
Maybe you could set up a cron job to run a Python script that would parse the feed's xml file for links to download?
IBM has a Python script for parsing RSS feeds. That script doesn't quite download the files, but you could probably modify it to something like this:
posted by movicont at 6:07 PM on May 7, 2009
IBM has a Python script for parsing RSS feeds. That script doesn't quite download the files, but you could probably modify it to something like this:
from RSS import ns, CollectionChannel, TrackingChannel
import urllib
tc = TrackingChannel()
tc.parse("http://news.google.com/?output=rss")
RSS10_TITLE = (ns.rss10, 'title')
RSS10_DESC = (ns.rss10, 'description')
items = tc.listItems()
for item in items:
url = item[0]
print "RSS Item:", url
item_data = tc.getItem(item)
newsItem = urllib.urlopen(url)
savedItem = open(item_data.get(RSS10_TITLE, "(none)"), 'w')
savedItem.write(newsItem.read())
newsItem.close()
savedItem.close()
But as I haven't tested the above, there's no guarantee it'll work right off the bat.posted by movicont at 6:07 PM on May 7, 2009
Best answer: There's two ways to do this. One is something that just sucks in all the things you've subscribed to. Gregarius is a web application that would like you do this, though there are others that vary in degree of complexity and features.
Another option is to use software like DevonTHINK. I have my browser set up to automatically save a local copy of a page into DevonTHINK (which is fully searchable and has it's own AI engine) with the keystroke of Command + 2. (Command + 1 is the "blog this and quote the text I've highlighted shortcut.)
(DevonTHINK can also subscribe to RSS feeds and let you just search everything. I do a combination of the above two methods.)
posted by Brian Puccio at 8:46 PM on May 10, 2009
Another option is to use software like DevonTHINK. I have my browser set up to automatically save a local copy of a page into DevonTHINK (which is fully searchable and has it's own AI engine) with the keystroke of Command + 2. (Command + 1 is the "blog this and quote the text I've highlighted shortcut.)
(DevonTHINK can also subscribe to RSS feeds and let you just search everything. I do a combination of the above two methods.)
posted by Brian Puccio at 8:46 PM on May 10, 2009
This thread is closed to new comments.
posted by Jaltcoh at 3:40 PM on May 7, 2009