Let's say I need every Web page with the word "obstreperous" and "Minnesota" in it. You can't ask a cataloguer in advance to say "Well, that's going to be a useful category, we should encode that in advance." Instead, what the cataloguer is going to say is, "Obstreperous plus Minnesota! Forget it, we're not going to optimize for one-offs like that." Google, on the other hand, says, "Who cares? We're not going to tell the user what to do, because the link structure is more complex than we can read, except in response to a user query."Seems to make sense, right? Wrong. Shirky confuses indexing words with indexing concepts. Roget's thesaurus lists seventeen synonyms for "obstreperous," which is about normal. Shirky's correct in that an index does a poor job of free-text searching. That's sort of the point. What he misses is that "Let's say I need every Web page with the word "obstreperous" and "Minnesota" in it." is very different from "Let's say I need every Web page about both obstreperousness and Minnesota or the ways in which Minnesota has demonstrated the quality of being obstreperous."
posted by Gator at 12:25 PM on February 20, 2006