How do stars work in Google?
March 8, 2005 9:28 AM   Subscribe

How do stars work in Google? David Beaver describes a series of searches using multiple asterisks in this Language Log post; specifically, he says: "there is * * * * a house in New Orleans" gets, at time of writing, precisely the same number of hits, 3560, as "there is a house in New Orleans" But when I click on the former, I only get two hits. What's going on here?

On preview: when I click on the link (taken directly from the LL post) in the Preview box, I get the result he describes! What's going on here??
posted by languagehat to Computers & Internet (16 answers total)
 
shouldn't the first be "there * * * * New Orleans"? you're asking for 4 extra words but didn't drop any. actually, that gets too many hits, but "there * * house * New Orleans" is very close to "there is a house in New Orleans".
posted by andrew cooke at 9:33 AM on March 8, 2005


when i click the link directly i get very few hits. i think you've got confused somewhere, maybe? or not all links are the same?
posted by andrew cooke at 9:35 AM on March 8, 2005


maybe they had a bug for a while and have now fixed it?
posted by andrew cooke at 9:36 AM on March 8, 2005


I think you found a Google bug.

After, clicking on that first link, try hitting the "Search" button again and again. My results sometimes show 2 results and sometimes 3000.
posted by vacapinta at 9:54 AM on March 8, 2005


bleagh. now i get 3690 too.
posted by andrew cooke at 9:58 AM on March 8, 2005


Google is currently reindexing. As with the last few reindexing processes, their results will be flaky for a while. I'm not entirely certain why they break the index in production in order to do this, but it's not the first time results have been unpredictable.
posted by majick at 10:00 AM on March 8, 2005


Try this one ("there is * * * * house in New Orleans"). It seems to be working properly (one asterisk per missing word).
posted by nobody at 10:01 AM on March 8, 2005


Google is currently reindexing.

Could someone explain this more clearly? Its not that different *numbers* of results are coming back but that different *types* of results are coming back.

In other words, the * is sometimes allowed to represent a null and sometimes its not.
posted by vacapinta at 10:38 AM on March 8, 2005


I can't seem to get it to work either.

This is a cool feature when it works. Like you, I found it via languagelog, except I first noticed it in this post, which I think most people here will appreciate it. I've tried to construct a query to search for the Metafilter: blah blah blah in-joke, but with limited success so far.
posted by casu marzu at 11:02 AM on March 8, 2005


Best answer: Could someone explain this more clearly? Its not that different *numbers* of results are coming back but that different *types* of results are coming back.

There isn't just one "Google"; when you do a search on Google the results you get back come from any one of 59 datacenters operated by Google, and during update periods the different datacenters will often be out of synch with each other; hence, during updates you sometimes get strange results when you do the same search multiple times.

If you're interested, this page lets you run a search on specific datacenters, one at a time or simultaneously.
posted by ubernostrum at 12:27 PM on March 8, 2005


ubernostrum, I understand that there isnt one Google and, in general how distributed processing works.

My question is that there seems to be an inconsistency between how different servers interpret the phrase "mercury * earth" as either allowing "mercury earth" to be an acceptable return or only allowing phrases like "mercury venus earth"

To me, that seems like an inconsistency in the parser not in the dataset. But as I understand it, it seems that even featuresets get rolled out across datacenters (right?) so a certain syntax may return a valid result in one datacenter but an error in another (or unexpected behavior)

If so, I understand but thats not what the phrase "reindexing" means to me, which is usually a pure data operation.
posted by vacapinta at 12:44 PM on March 8, 2005


I think the intent of the * is to replace exactly one non-empty word. I wish Google did partial word matches like "there is a house in new *leans", but that doesn't work...
posted by knave at 1:04 PM on March 8, 2005


doesn't local dns caching make this explanation unlikely? i had it changing back and forth on rapid clicks. it seems more like a problem within a single address (which i presume is a less than or equal to a datacentre).
posted by andrew cooke at 1:05 PM on March 8, 2005


vacapinta: when Google is doing a large update, you never know what you're going to get. Searches at different datacenters will return different numbers of results in different order; it's not just a re-ordering of the existing index, it also involves pruning dead pages out of the index, inserting new ones and updating those which have changed.

It used to be that Google did one of those massive, earth-shaking updates every month, but I recall reading that they switched to a "rolling" model a while back, with the large updates happening less frequently.

As to whether it's the cause of this particular issue or not, I couldn't say. But if Google's in an update period, this behavior wouldn't surprise me (also, the different datacenters serve different HTML templates for results pages -- maybe they're not all running the same code).
posted by ubernostrum at 1:42 PM on March 8, 2005


Are there any decent search engines that allow wildcards within words, a la "*leans"?
posted by CunningLinguist at 5:20 PM on March 8, 2005


Best answer: I'm a little late to this thread, but the blog Google Blogoscoped posted about this just yesterday: Google Wildcard Broken.
posted by llamateur at 5:48 PM on March 8, 2005


« Older Hotels in Charlottesville Virginia?   |   What is the poem associated with a painting I... Newer »
This thread is closed to new comments.