Join 3,442 readers in helping fund MetaFilter (Hide)


Number of Google hits
September 15, 2011 9:10 PM   Subscribe

I have a column of words in an excel file. I want to get the number of google results for each phrase in my file. Is there a simple way to do this?

I remember I found a way to do it once before if the words were made of ASCII characters, but my words contain Unicode characters, and it didn't work.
posted by AlexanderPetros to Computers & Internet (4 answers total) 2 users marked this as a favorite
 
Google Refine might get you there.

In particular, check out the Fetching URLs from web services and Stripping Html docs.

Off the top of my head, I'd:
  1. put all the words in your list in the first column
  2. "Add Column By Fetching URLs" and generate a valid google search query URL
  3. run regular expressions on the new column to extract the number of results for each

YMMV
posted by stoic at 12:18 AM on September 16, 2011


I found a shell script that will let you google from the command line... I suspect it's an easy jump from here to a loop that will query each term in a list.
posted by Wild_Eep at 6:46 AM on September 16, 2011


This script from eggheadcafe.com might also be a helpful lead.
posted by samsara at 6:58 AM on September 16, 2011


(MeFi's own!) XKCD wrote a blag post earlier this year about how he generated one of the comics. He links to this tool, which might be able to do what you want.

It looks like if you just set the query to be just a variable (say "<Q>"), the type to "CEnum", and pasted in your list of words separated by commas, you'd get your results and can download them as a CSV. It works for my (made up) dataset, anyway. Depending on how tech-savvy you are, the source code is provided on the tool's page if you can't make the web version do precisely what you need.

Take note that, as XKCD warns, the number of results listed on the first page of results is at best an estimate, and at worst a complete fabrication, so any method that simply scrapes the text from the search results won't be accurate.
posted by yuwtze at 4:16 PM on September 16, 2011 [1 favorite]


« Older We're about to refinance our m...   |  Please recommend books: urban ... Newer »
This thread is closed to new comments.