News analysis
January 28, 2013 11:11 AM   Subscribe

I need to do a rough and ready news analysis. I want to go to a specific news site and see the frequency of certain words and phrases before and after a certain date. Ideally, I'd be able to do this in a slightly more complex way, with combinations of words and phrases. I want to do this for free. Does such a tool exist?

Say a particular politician was always considered a bit of a clown. After he saves a small child from drowning, the way in which he was spoken of in a particular news outlet shifts so he's now spoken of as a serious candidate. I want to fiddle around with a tool which would let me see if there has been such a shift.

Is this something that exists?

(I know I saw something which did something similar developed by some American university. I requested a trial and never heard back. All memory of URL, search terms etc has subsequently been wiped, argh.)
posted by tavegyl to Computers & Internet (4 answers total) 1 user marked this as a favorite
Provalis has the data-mining software duo QDA Miner and WordStat that seems right up your ally. I use them frequently for my content and framing analyses. You can download the 30 trail without any sort of red tape, but it may take you that long to learn the software...I suggest you read the manuals thoroughly but I learned mostly through diving in and fooling around. There is an option to download them separately or together, and I suggest you download them as a duo.

WordStat will mine your text files for the most frequent "keywords" (across all files) and even allows you to uncover the most frequent phrases across all cases (up to 7 word strings). You can then save these words and phrases and use them to code your data in QDA Miner. You can then analyze the overall occurrence of these codes against variables such as the date of publication or the news outlet covering the story, or both, etc. Basically you can use it to conduct a full-fledged content analysis from start to finish.

Hope this is helpful!
posted by Young Kullervo at 11:27 AM on January 28, 2013

This would involve several steps.

You would need to scrape the websites to get the text you want to analyze. (wget)

You will have to do some custom scripting to pull out the article dates.

Then you will need to analyze the files - I'd use a scripting language for full control. Python has an n-gram library as does ruby. Other than that you could use regex - which means you now have more problems.
posted by srboisvert at 11:57 AM on January 28, 2013

srboisvert and Young Kullervo both have good suggestions. Expanding a bit, what you're doing is close to sentiment analysis over time. This blog post has a good list of tools, and perhaps the university application you were looking at is Stanford's NLP tools.
posted by heliostatic at 12:47 PM on January 28, 2013

You could use the advanced search interface within Lexis Nexis or Factiva to do exactly what you describe.

Both paid tools, but if you have access to a University library or similar, you may be able to use their academic license for free.
posted by bifter at 12:58 PM on January 28, 2013

« Older Suggestions for tablet and e-games for bedridden...   |   How to compare longevity with productivity? Newer »
This thread is closed to new comments.