News analysis
January 28, 2013 11:11 AM
I need to do a rough and ready news analysis. I want to go to a specific news site and see the frequency of certain words and phrases before and after a certain date. Ideally, I'd be able to do this in a slightly more complex way, with combinations of words and phrases. I want to do this for free. Does such a tool exist?
Say a particular politician was always considered a bit of a clown. After he saves a small child from drowning, the way in which he was spoken of in a particular news outlet shifts so he's now spoken of as a serious candidate. I want to fiddle around with a tool which would let me see if there has been such a shift.
Is this something that exists?
(I know I saw something which did something similar developed by some American university. I requested a trial and never heard back. All memory of URL, search terms etc has subsequently been wiped, argh.)
Say a particular politician was always considered a bit of a clown. After he saves a small child from drowning, the way in which he was spoken of in a particular news outlet shifts so he's now spoken of as a serious candidate. I want to fiddle around with a tool which would let me see if there has been such a shift.
Is this something that exists?
(I know I saw something which did something similar developed by some American university. I requested a trial and never heard back. All memory of URL, search terms etc has subsequently been wiped, argh.)
This would involve several steps.
You would need to scrape the websites to get the text you want to analyze. (wget)
You will have to do some custom scripting to pull out the article dates.
Then you will need to analyze the files - I'd use a scripting language for full control. Python has an n-gram library as does ruby. Other than that you could use regex - which means you now have more problems.
posted by srboisvert at 11:57 AM on January 28, 2013
You would need to scrape the websites to get the text you want to analyze. (wget)
You will have to do some custom scripting to pull out the article dates.
Then you will need to analyze the files - I'd use a scripting language for full control. Python has an n-gram library as does ruby. Other than that you could use regex - which means you now have more problems.
posted by srboisvert at 11:57 AM on January 28, 2013
srboisvert and Young Kullervo both have good suggestions. Expanding a bit, what you're doing is close to sentiment analysis over time. This blog post has a good list of tools, and perhaps the university application you were looking at is Stanford's NLP tools.
posted by heliostatic at 12:47 PM on January 28, 2013
posted by heliostatic at 12:47 PM on January 28, 2013
You could use the advanced search interface within Lexis Nexis or Factiva to do exactly what you describe.
Both paid tools, but if you have access to a University library or similar, you may be able to use their academic license for free.
posted by bifter at 12:58 PM on January 28, 2013
Both paid tools, but if you have access to a University library or similar, you may be able to use their academic license for free.
posted by bifter at 12:58 PM on January 28, 2013
« Older Suggestions for tablet and e-games for bedridden... | How to compare longevity with productivity? Newer »
This thread is closed to new comments.
WordStat will mine your text files for the most frequent "keywords" (across all files) and even allows you to uncover the most frequent phrases across all cases (up to 7 word strings). You can then save these words and phrases and use them to code your data in QDA Miner. You can then analyze the overall occurrence of these codes against variables such as the date of publication or the news outlet covering the story, or both, etc. Basically you can use it to conduct a full-fledged content analysis from start to finish.
Hope this is helpful!
posted by Young Kullervo at 11:27 AM on January 28, 2013