Join 3,497 readers in helping fund MetaFilter (Hide)


Suggestions for document analysis topics.
September 22, 2011 7:03 PM   Subscribe

Help picking a topic that lends itself to document analysis via XML-based techniques that preferably has to do with Politics or Political Science (more inside)

I'm stumped. I'm trying to come up with a "digital humanities" project and can't seem to come up with anything fitting. Since it's in XML, structure is very important because it lends itself to some automated processes. However, short of poems and plays (which I have no experience or real interest in), I can't think of how to do this.

I'd really like to do something in politics and have considered everything from speech analysis to poll analysis.

Speech analysis requires a bit too much Natural Language Processing though that could work if I had a really interesting and usable topic.

Polling analysis is difficult because the polls are not held in the same time periods so it's hard to know whether the phrasing of the questions or the daily changes in opinion based on current events are the relevant factors in differing results.

I need a fairly moderate to large amount of material from which to draw conclusions based on the premise and document analysis.

Any suggestions?

I'm also open to the idea of looking at aggregate sites like Metafilter, Reddit, etc. but once again, not sure how the structure itself will lend itself to analysis.
posted by Raichle to Technology (10 answers total)
 
Many documents from the US government are available in XML. For instance, the President's Budget (at whitehouse.gov/omb), the Code of Federal Regulations, and the U.S. Code (both at www.gpo.gov/fdsys) are available in XML. You could pull all sorts of interesting things out of there, though you'll have to come up with those ideas...
posted by massysett at 7:28 PM on September 22, 2011


Well, I'd need to be looking for a particular theme and while those have interesting info, they don't really have a thematic structure or relation..
posted by Raichle at 7:30 PM on September 22, 2011


I'd also love any suggestions that involve conspiracy theories. I'm so all over the place with this and just can't narrow it down
posted by Raichle at 7:35 PM on September 22, 2011


How complicated or serious does this have to be? A tagger that marks up bills submitted to Congress might be reasonably straightforward. Many of the bills have a formulaic, easily identifiable introduction that says "So and so (on behalf of him/herself, Ms. X of Texas, Mr. Y of Florida, ...) submitted this bill." And taking your tagged versions and turning them into a social network diagram of Congressional cliques who tend to work together on minor bills might be interesting, although I wouldn't be surprised if it's either already been done for every Congress or, for some reason to do with how bills are authored, isn't very meaningful.
posted by Monsieur Caution at 7:40 PM on September 22, 2011


Weirdly that's very similar to what was just suggested by my professor. That's a very possible suggestion, thanks!
posted by Raichle at 7:50 PM on September 22, 2011


Can you provide more detail? For example: it's in XML What is in XML? Is there a program you're feeding these documents to? If so, it's not likely it will be able to understand any old XML document - they'll have to be in a certain format. Additionally, the following is a technically valid bit of XML:
<![CDATA[ whatever i want goes here ]]>

posted by sanko at 7:50 PM on September 22, 2011


No, I'd be marking it up with my own tags and then submitting it to some kind of analysis via xQuery, XSLT, Xpath, etc.
posted by Raichle at 7:54 PM on September 22, 2011


I'm not so much worried about the method now as just brain-storming towards some interesting ideas that I can think about.
posted by Raichle at 7:56 PM on September 22, 2011


poke around xml.gov
and O'Reilly's Gov2.0 conference, for which the slides and videos are here: http://www.gov2summit.com/gov2010/public/schedule/proceedings
posted by at at 9:54 PM on September 22, 2011


I think your question is centered too much on XML aspects and not enough on the Digital Humanities part. I suggest you look at some existing projects (I worked on NINES as an example of a 19c English Lit project) for inspiration. These projects may involve transcription, mapping, time line analysis or data visualization. Is there some area in the humanities that really gets you asking questions? Interesting questions yield interesting projects. To that end, I would start there and let the issues about the data formats become a distant secondary problem.
posted by dgran at 7:02 AM on September 23, 2011


« Older I need a good, slightly nerdy ...   |  Help me with my short story... Newer »
This thread is closed to new comments.