Google search help.
June 25, 2005 3:37 PM   Subscribe

Can google (or any search engine) be made to search for a small portion of a word?

I need to find a foreign word. I have a small portion of the word which I would like to put into google and have it return the entire word. I have no idea what the whole word is from looking at this fragment. Wildcard seem to be designed for phrases only, not letters, and an internet search on wildcards and searching techniques yielded nothing. At the moment I am getting no results when I put the portion of the word in (in quotation marks and not) - either there are genuinely no results, or google won't search for a collection of letters together that don't form a word.

To clarify: inputting "lhereusement" (minus inverted commas) returns a spelling correction for the word hereusement and no actual results, when ideally I want it to show up every instance of the word malhereusement. Putting an asterisk at the beginning doesn't work, what can I do?

The partial word I am looking to complete is "amotemets," fwiw. It's possibly a Dutch word and that part probably comprises the middle and the end of the word. It doesn't look much like any language to me, which is why I hoped to use google.
posted by fire&wings to Computers & Internet (15 answers total) 1 user marked this as a favorite
 
Best answer: Nope. There are search tools for single databases that do this, but engines that search the entire Web so do by building a word-based index. Even if they could call into that index by prefix, they couldn't search by arbitrary substring.
posted by nicwolff at 3:48 PM on June 25, 2005


nicwolff, I'm not so sure about that. A friend who's starting with Google, this August, told me, that on campus, they can use regular expressions.
posted by Gyan at 4:16 PM on June 25, 2005


A friend who's starting with Google, this August, told me, that on campus, they can use regular expressions.

HOLY CRAP. GIMME NOW NOW NOW.
posted by Mo Nickels at 4:22 PM on June 25, 2005


I'm pretty sure it's not Dutch as written. Could it have typos? Might it help to give some context as to where this fragment is from? I mean, your question is not really "how do I find a fragment in Google" (though that is a good question) but "how do I find this particular word," whether or not via Google, right?
posted by languagehat at 4:23 PM on June 25, 2005


Exalead appears to support regular expression searching.
posted by vacapinta at 4:30 PM on June 25, 2005


fire&wings,

I don't have an answer to your word need, but I can clarify your question about searching. You may know that what you want is a wildcard search.

Google doesn't support wildcard searching for one word searches (i.e. auto*). It happens to support it for phrases (e.g. "holy * crap" will return "holy freakin' crap, " "holy friggin' crap," etc..), but this won't help you.

You might need to pursue a database other than google. If I think of anything I'll get back to the thread, but all of us librarians are busy with the ALA conference.
posted by ArcAm at 4:33 PM on June 25, 2005


Response by poster: languagehat - Searching for this bunch of letters piqued my curiosity as to why google can't do what I want, and I became pretty interested in that quite apart from what I was actually looking for. That part has been answered thanks to nicwolff.

I am trying to identify a piece of artwork. My expertise lies in identifying the art and this looks very much like a turn of the century Flemish etching. There is a building to the left of the picture with a sign above the door reading "..AMOTEMETS" with the first part out of frame to the left. It might be "AMOTEMETSI" but I can't tell if it's an "I" at the end of just a bunch of shading. It's a very traditional, representational picture - I can't see this being some sort of made up gibberish, I'm sure it means something. AMOTEMETS doesn't look typically Dutch, or like any other language to be honest - I'm just going on how the picture looks.
posted by fire&wings at 4:39 PM on June 25, 2005


altavista could use * for letters, I seem to recall. Not that I’ve used it in ages.
posted by signal at 4:40 PM on June 25, 2005


piqued my curiosity as to why google can't do what I want

I bet that the answer is that regular expressions (or any subword pattern matching scheme) are much more computationally complex than however they do whole word searches. Even if they can do it on campus, their hardware might not be able to handle it with sufficient speed if that capability was opened up to the world.
posted by advil at 6:07 PM on June 25, 2005


Proper regular expression searches on a database that size? I doubt it. It might be able to do some simple wildcard pattern matching and such, but I don't see how they could do things like backreferences without degenerating into doing a linear scan of their database.
posted by fvw at 6:18 PM on June 25, 2005


there's regexps and regexps - perl 5 regexps are not really regular expressions (they are less restrictive than type 3 grammars in the chomsky hierarchy). in other words, there are different algorithms that could be used depending on just how complex the regexp was.

it's also possible (for sufficiently simple regexps) that they use two separate stages processing - expanding the regexp to describe a set of "normal" search terms using a dictionary, and then searching the full database for each search in the set (perhaps with some kind of heuristic to pre-order them so that best/fastest searches are tried first).

(hey google, gimme a job!)
posted by andrew cooke at 7:02 PM on June 25, 2005


Best answer: OK, now that you've explained the context, I think I have an answer for you: it's Latin, with no word breaks (as is usual in inscriptions). Amo temet means 'I love you' (with a slight emphasis on "you"); you can see the words in the reverse order (which makes no difference in Latin) in the third line of this bad patriotic poem (Mea patria amoena,/ ubi natus sum,/ temet amo valde/ et semper laudo te 'My nice country,/ where I was born,/ I love you a lot/ and always praise you'). If that's the case, the S might be an abbreviated signature. Good luck with the identification!
posted by languagehat at 7:07 AM on June 26, 2005


It's the truth: Google employees have access to the whole of the database, and can run any kind of query they please, including regexps. Got a buddy working down there. He takes great joy from it.

Of course, when you work as hard and long as the Google engineers do, they deserve the perks like the free food and God mode on the master database. Good for them.
posted by symphonik at 11:25 AM on June 26, 2005


Response by poster: languagehat - thank you! It didn't occur to me that it could be two seperate words. Looking at it now, it definitely is two. Although this doesn't help much with tracking down an artist, it's interesting nonetheless.
posted by fire&wings at 1:49 PM on June 26, 2005


Sorry, to clarify - sure, if you have a local copy of the index, you can do anything you want!
posted by nicwolff at 4:54 PM on June 28, 2005


« Older best acoustic guitar recording technique?   |   shared housing question Newer »
This thread is closed to new comments.