Mining news sites for data.
January 23, 2004 5:02 PM
Subscribe
Is there a way, without constant human intervention, to (1) mine either Google News, Yahoo News, or the AP for new obituaries and (2) drop the name, age, blurb, and URL into a database?
I've pondered this for a while. A really crude way would be to search headlines for ", [0-9][0-9], " and " dies at [0-9][0-9]." But I'm not sure this would pick up everything. For example, if I search Google News for
"kangaroo" I get only two links out of about 20 that identify Bob Keeshan's name, the reason for his fame, and his age. Most say simply "Captain Kangaroo Dies". And only the
NYT headline has all the data elements separated by commas (and is likely not consistent on that point with each obit.)
Any cleaner ideas?
posted by PrinceValium to computers & internet (5 comments total)
posted by oissubke at 5:19 PM on January 23, 2004