(How|Where) does one find Google's (guide|help) on (searching|querying) using regular expressions?
July 28, 2005 4:34 AM
Well, what do you know, err.. what do I know, Google does support some sort of regular expressions. However, I can't find the usage guide. Anyone?
I know, I was going to post a comment there, but it's locked.
posted by Gyan at 7:48 AM on July 28, 2005
posted by Gyan at 7:48 AM on July 28, 2005
It's a big jump from "supports permutations" to "supports regular expressions" -- just like it supporting "*" doesn't mean it supports shell globbing. The "(a|b|c)" notation seems to be identical to the documented "(a OR b OR c)" notation.
posted by mendel at 7:59 AM on July 28, 2005
posted by mendel at 7:59 AM on July 28, 2005
I've played quite a bit with search syntax but, no, Google does not support some sort of regular expressions. Perhaps you've been misled as to what a regular expression is?
posted by majick at 8:04 AM on July 28, 2005
posted by majick at 8:04 AM on July 28, 2005
mendel : "It's a big jump from 'supports permutations' to 'supports regular expressions'"
I know, that's why I said 'some sort'. Clearly, escape sequences don't work. I know Google can support regex, because my friend who's starting work there in August, has seen it on-campus.
posted by Gyan at 8:24 AM on July 28, 2005
I know, that's why I said 'some sort'. Clearly, escape sequences don't work. I know Google can support regex, because my friend who's starting work there in August, has seen it on-campus.
posted by Gyan at 8:24 AM on July 28, 2005
Just because it's something their internal interface supports doesn't mean you'll ever see it available on the external interface. The fact is that plain "keyword" searching is computationally about a million times easier to implement on the google scale than a regular expression DFA / NFA engine. The only possible way to make keyword searching efficient over hundreds of terabytes (or whatever their index is up to these days) is to precompute an index of words.
In fact a full regex engine is turing-complete, and you can write arbitrary regexps that will gobble up near infinite amounts of CPU time and memory. For all these reasons it would be technical insanity for them to offer regex searching to the general public.
posted by Rhomboid at 9:36 AM on July 28, 2005
In fact a full regex engine is turing-complete, and you can write arbitrary regexps that will gobble up near infinite amounts of CPU time and memory. For all these reasons it would be technical insanity for them to offer regex searching to the general public.
posted by Rhomboid at 9:36 AM on July 28, 2005
Since you guys want to be so nitpicky about what a regular expression is, it's an expression that defines a regular language. Even a single character is a regular expression. You don't have to have wildcards to have a regular expression. So in some sense, Google has always supported "some sort of" regular expressions. In another, they do not support the full expressiveness of regular expressions.
In fact a full regex engine is turing-complete
Perl-compatible "regexes" have features that mean a Turing machine is necessary to accept them, but properly speaking the "regular" in regular expressions means that you can accept them with only a finite state automaton.
posted by grouse at 12:15 PM on July 28, 2005
In fact a full regex engine is turing-complete
Perl-compatible "regexes" have features that mean a Turing machine is necessary to accept them, but properly speaking the "regular" in regular expressions means that you can accept them with only a finite state automaton.
posted by grouse at 12:15 PM on July 28, 2005
ok, it's late, but it seems to me that the best description of google is that it "supports regular expressions over the alphabet consisting of words in web pages except for the implementation of a kleene star".
gyan's example shows both concatenation and alternation, as long as you remember that the alphabet is words, not letters.
wikipedia
posted by andrew cooke at 2:00 PM on July 28, 2005
gyan's example shows both concatenation and alternation, as long as you remember that the alphabet is words, not letters.
wikipedia
posted by andrew cooke at 2:00 PM on July 28, 2005
(although the lack of ordering in most queries - which don't use quotes or plus - can't be described in regular expressions without a kleene star!)
posted by andrew cooke at 2:02 PM on July 28, 2005
posted by andrew cooke at 2:02 PM on July 28, 2005
This thread is closed to new comments.
posted by jessamyn at 7:28 AM on July 28, 2005