I'm trying to map protests in the United States, but I'm grappling with data sources (and will eventually tangle with data management). Any ideas?
I'd like to map out protests, riots, bombings, and other cheerful social outings - ideally in the United States, where I have the most contextual knowledge, but that's not a necessity.
My original plan was to scrape AP's US news RSS feed, store everything in some sort of XML database, and then query that for what I need. I just checked their RSS format, and it unfortunately doesn't include the full article. Nor does it include a separate tag for the location, which would make geocoding a bit/much nastier. NYT's feeds are basically the same story. I don't really know where to go from here.
There are basically five steps, and I would love advice on any:
1. Scrape database of news articles.
2. Store in a format that would allow querying by date or location. I'd like to keep all the articles, too, because... really, that would be an awesome dataset.
3. Tag protests (method: NLP, Mech Turk, or caffeinated McB).
4. Tag with date and location.
5. Make pretty maps.
Step 6 is going crazy with spatial stats, but I've got that part covered. I've been letting this project fester for too long, and it is now certifiably
brain crack. Any advice on 1-5 would be greatly appreciated.
Aside: I really have thought about the ethical consequences of this. If you're concerned, MeFiMail me and I'll do my best to assuage your doubts.
posted by proj at 6:59 PM on June 11