asterisk and the soothsayer
June 7, 2006 2:13 AM   Subscribe

are there still search engines that accept an asterisk as a wild card?

i'm trying to find a town whose name starts with an R and ends in "burg". i thought i could find it by typing "r*burg" into a search engine, but that doesn't work (with or without quotes). they all return answers that feature "burg" as a stand-alone word. i tried google, yahoo, excite and others. even old altavista can't do it anymore, apparently.
thanks in advance.
posted by Silky Slim to Computers & Internet (10 answers total)
 
Best answer: You want this.
posted by rdr at 3:00 AM on June 7, 2006


Also this.
posted by blueshammer at 4:01 AM on June 7, 2006


(Those look like great solutions to Silky Slim's particular problem, but I'd still be curious to see a general answer to his question. Is there a web search engine out there that takes wildcards?)
posted by nebulawindphone at 4:05 AM on June 7, 2006


Response by poster: thanks, rdr. that solved my problem.
and as for the larger question... does anybody know what the deal is? i'm pretty sure altavista offered this functionality back in the day. why this step backward? is it because they're trying to be google?
posted by Silky Slim at 4:15 AM on June 7, 2006


I think it's something to do with anti-spam measures? I remember reading about it somewhere, sorry if I can't recall more specific details.
posted by funambulist at 4:39 AM on June 7, 2006


Most fee-based databases (like ProQuest, LexisNexis) and library catalogs use * as a wildcard. Google uses * as a wildcard, but for words (especially helpful in quotes), not letters.
posted by unknowncommand at 5:03 AM on June 7, 2006


does anybody know what the deal is?

Processing overhead. Normally when you search a keyword it has a pre-made list of every page that contains that word, so doesn't have to do any work.

Whereas, for wildcards it pretty much has to look through all the words it knows one at a time and see if they match your query. Once it knows all the keywords that match your query, it has to merge the list of pages for each keyword into one master list of pages that match your query. Both of these operations are quite slow and very hard to speed up.
posted by cillit bang at 7:29 AM on June 7, 2006


What cillitbang said- the index of a search engine can itself be terabytes in size, essentially comprising a complete list of all terms, and the document IDs that contain those terms and their weight in each page and the rank of the document in overall value. When you search, up to 10 words are allowed (for google, at least) in your search query. Thus, the search engine only has to look up a maximum of 10 words in its index, and it can look them up quickly.

If you use a wildcard, you could potentially be not only scouring terabytes of information (as opposed to being able to directly look up 1-10 words in the index) but once you've found all matching words, the list of document IDs would be massive.

Now the search engine has to go and sort a much larger list of document IDs into a proper rank (which is hard, because what is the 'relevancy' of a wildcarded word?), and then go retrieve the page text of the first 10 or 20 and find your wildcarded terms and return those for the page abstract.

Most search engines will automatically do spell checking and word forms (plural and singular, for example) but this is a minor operation adding only a couple of terms to the list of words to separately look up and combine... compared to potentially adding thousands of possible words to a search just by virtue of you using a wildcard after three letters. When search engines had so few documents they could simply scour all the stored documents in memory or on disk in a reasonable amount of time, this was an easier accomplishment. With 5-10 billion documents and growing for the major search engines, the data stored is measured in petabytes or more (thousands of terabytes), stored on thousands of different servers.
posted by hincandenza at 8:48 AM on June 7, 2006


Not that processing-overhead isn't a good answer, but it's also that search engines are all "trying to be Google".

The leap forward that Google made is that it figures out what you're looking for an a much more intelligent way than just "which documents contain these words". Or, at least, it tries. If you search for "Paris Hilton", Google "knows" that you're probably looking for the celebrity, not the hotel. They devote incalculable amounts of processing power to that kind of thing, next to which a few asterisks here and there would be nothing.
posted by AmbroseChapel at 4:44 PM on June 7, 2006


Response by poster: and yet...

ok. that was interesting. thanks everyone!
posted by Silky Slim at 2:19 AM on June 8, 2006


« Older What in the world did my dentist twin with the...   |   How to extend my run? Newer »
This thread is closed to new comments.