<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

	<title>Comments on: When Google just isn't enough.</title>
	<link>http://ask.metafilter.com/113554/When-Google-just-isnt-enough/</link>
	<description>Comments on Ask MetaFilter post When Google just isn't enough.</description>
	<pubDate>Fri, 06 Feb 2009 11:07:55 -0800</pubDate>
	<lastBuildDate>Fri, 06 Feb 2009 11:07:55 -0800</lastBuildDate>
	<language>en-us</language>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<ttl>60</ttl>

	<item>
		<title>Question: When Google just isn&apos;t enough.</title>
		<link>http://ask.metafilter.com/113554/When-Google-just-isnt-enough</link>	
		<description>Can anyone recommend a powerful textual search tool I can use on anything I want? &lt;br /&gt;&lt;br /&gt; So I&apos;m looking for a powerful search tool for academic research. To clarify, I&apos;m &lt;i&gt;not&lt;/i&gt; having trouble finding sources. I want to be able to search &lt;i&gt;within&lt;/i&gt; sources for &lt;i&gt;inexact&lt;/i&gt; phrases.&lt;br&gt;
&lt;br&gt;
As it turns out, Google is powerful in the sense that it can find terms almost anywhere, but the search engines on WestLaw and LexisNexis are ridiculously powerful in the arguments they allow you to use. For example, x /s y finds x in the same sentence as y; x /p y finds x in the same paragraph as y, x /5 y finds x within five words of y, etc. &lt;br&gt;
&lt;br&gt;
This is incredibly useful, especially if a term is used in more than one way but I&apos;m only interested in one of them. I would like to be able to do this with arbitrary text documents from sources like Project Gutenberg, but I can&apos;t seem to get Google (or Google Desktop) to do this. Does anyone have any ideas to either improve my google-fu or for an alternative search tool? &lt;br&gt;
&lt;br&gt;
Web-based or Windows-compatible is fine, but I&apos;d like to avoid paying for it if at all possible. Help me, hive mind!</description>
		<guid isPermaLink="false">post:ask.metafilter.com,2009:site.113554</guid>
		<pubDate>Fri, 06 Feb 2009 10:58:49 -0800</pubDate>
		<dc:creator>valkyryn</dc:creator>
		
			<category>google</category>
		
			<category>search</category>
		
			<category>research</category>
		
			<category>text</category>
		
			<category>arguments</category>
		
	</item> <item>
		<title>By: trip and a half</title>
		<link>http://ask.metafilter.com/113554/When-Google-just-isnt-enough#1631182</link>	
		<description>Well, &lt;a href=&quot;http://www.dtsearch.com/&quot;&gt;dtSearch&lt;/a&gt; would probably be what you want, but it doesn&apos;t quite meet the &quot;avoid paying for it&quot; criterion.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2009:site.113554-1631182</guid>
		<pubDate>Fri, 06 Feb 2009 11:07:55 -0800</pubDate>
		<dc:creator>trip and a half</dc:creator>
	</item><item>
		<title>By: zippy</title>
		<link>http://ask.metafilter.com/113554/When-Google-just-isnt-enough#1631184</link>	
		<description>It&apos;s not LexisNexis, but the command line tool &apos;agrep&apos; does fuzzy text matches and might be useful for Gutenberg texts. I don&apos;t believe it will do the &quot;within 5 words of&quot; searches that you want, however.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2009:site.113554-1631184</guid>
		<pubDate>Fri, 06 Feb 2009 11:09:07 -0800</pubDate>
		<dc:creator>zippy</dc:creator>
	</item><item>
		<title>By: zippy</title>
		<link>http://ask.metafilter.com/113554/When-Google-just-isnt-enough#1631190</link>	
		<description>&lt;a href=&quot;http://lucene.apache.org/java/docs/features.html#Features&quot;&gt;Lucene&lt;/a&gt; is free and lets you do &quot;proximity queries&quot; (among other things).</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2009:site.113554-1631190</guid>
		<pubDate>Fri, 06 Feb 2009 11:10:23 -0800</pubDate>
		<dc:creator>zippy</dc:creator>
	</item><item>
		<title>By: gum</title>
		<link>http://ask.metafilter.com/113554/When-Google-just-isnt-enough#1631195</link>	
		<description>&lt;a href=&quot;http://en.wikipedia.org/wiki/Regular_expression&quot;&gt;Regular Expressions&lt;/a&gt; is the tool you need, and a number of programming text editors offer it in a number of flavors. &lt;a href=&quot;http://www.regular-expressions.info/&quot;&gt;There is a learning curve&lt;/a&gt;, but if you want really powerful search tools, this is where you want to go.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2009:site.113554-1631195</guid>
		<pubDate>Fri, 06 Feb 2009 11:12:08 -0800</pubDate>
		<dc:creator>gum</dc:creator>
	</item><item>
		<title>By: Tawita</title>
		<link>http://ask.metafilter.com/113554/When-Google-just-isnt-enough#1631197</link>	
		<description>You might also want to have a look at &lt;a href=&quot;http://www.staggernation.com/cgi-bin/gaps.cgi&quot;&gt;Google API Proximity Search (GAPS)&lt;/a&gt;.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2009:site.113554-1631197</guid>
		<pubDate>Fri, 06 Feb 2009 11:13:41 -0800</pubDate>
		<dc:creator>Tawita</dc:creator>
	</item><item>
		<title>By: zippy</title>
		<link>http://ask.metafilter.com/113554/When-Google-just-isnt-enough#1631241</link>	
		<description>Also, you can hack proximity searches in regular google using the &apos;*&apos; wildcard operator.&lt;br&gt;
&lt;br&gt;
For instance, if you&apos;re looking for &quot;hamster&quot; w/3 &quot;dance&quot; you can do these queries:&lt;br&gt;
&lt;br&gt;
&quot;hamster dance&quot;&lt;br&gt;
&quot;hamster * dance&quot; (separated by one word)&lt;br&gt;
&quot;hamster * * dance&quot; (sep. by two words)</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2009:site.113554-1631241</guid>
		<pubDate>Fri, 06 Feb 2009 11:47:39 -0800</pubDate>
		<dc:creator>zippy</dc:creator>
	</item><item>
		<title>By: scruss</title>
		<link>http://ask.metafilter.com/113554/When-Google-just-isnt-enough#1631249</link>	
		<description>Sounds like you want to set up a language corpus to search for collocates and other structures. It&apos;s been a while since I did this, so I don&apos;t know what&apos;s state of the art.&lt;br&gt;
&lt;br&gt;
&lt;a href=&quot;http://www.lexically.net/wordsmith/index.html&quot;&gt;WordSmith Tools&lt;/a&gt; isn&apos;t free, but it was pretty decent for very complex searches a few years back, and is unlikely to have got worse. Version 3 is free for private use.&lt;br&gt;
&lt;br&gt;
You&apos;re probably looking for corpus linguistics software, maybe more specifically concordance analysis. A slightly dated list of resources to get you started is &lt;a href=&quot;http://nlp.stanford.edu/links/statnlp.html&quot;&gt;here&lt;/a&gt;.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2009:site.113554-1631249</guid>
		<pubDate>Fri, 06 Feb 2009 11:56:29 -0800</pubDate>
		<dc:creator>scruss</dc:creator>
	</item><item>
		<title>By: Zed</title>
		<link>http://ask.metafilter.com/113554/When-Google-just-isnt-enough#1631359</link>	
		<description>Do you want to search things on the web, or things after you&apos;ve downloaded them to your local machine? Big difference.&lt;br&gt;
&lt;br&gt;
If it&apos;s the latter, ditto gum, above. Regexps are the key. Combine with the standard UNIX command line tools (available in Windows through Cygwin) and a scripting language like Perl (or ruby, or sed and awk, but probably not Python for this application -- it&apos;s not meant for command line pipelines, but it&apos;d be fine if everything you were doing was complicated enough to justify its own script.)&lt;br&gt;
&lt;br&gt;
For instance, two words in the same paragraph (file must have UNIXish line endings instead of DOS):&lt;br&gt;
&lt;br&gt;
&lt;tt&gt;perl -00 -ne &apos;print if /one/i and /two/i&apos; filename.txt&lt;/tt&gt;&lt;br&gt;
&lt;br&gt;
Looking within n words of each other, or looking within the same sentence quickly gets much more complicated (especially the latter, as figuring out what you want to interpret as a sentence isn&apos;t trivial, and will produce undesirable results on some texts no matter what you do.)</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2009:site.113554-1631359</guid>
		<pubDate>Fri, 06 Feb 2009 14:00:21 -0800</pubDate>
		<dc:creator>Zed</dc:creator>
	</item><item>
		<title>By: Zed</title>
		<link>http://ask.metafilter.com/113554/When-Google-just-isnt-enough#1631368</link>	
		<description>actually, that&apos;s going to fail for short words that&apos;d be expected to occur within other words. And thinking about correcting for that in a way that recognizes words correctly regardless of proximity to punctuation... um, forget this route unless you&apos;re already a programmer or you want to be one.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2009:site.113554-1631368</guid>
		<pubDate>Fri, 06 Feb 2009 14:07:49 -0800</pubDate>
		<dc:creator>Zed</dc:creator>
	</item><item>
		<title>By: hattifattener</title>
		<link>http://ask.metafilter.com/113554/When-Google-just-isnt-enough#1631409</link>	
		<description>Regexps are very powerful but I don&apos;t think they&apos;re the right tool for this job &#8212;&#160;you want some sort of specialized natural-language-search engine.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2009:site.113554-1631409</guid>
		<pubDate>Fri, 06 Feb 2009 14:51:21 -0800</pubDate>
		<dc:creator>hattifattener</dc:creator>
	</item><item>
		<title>By: zengargoyle</title>
		<link>http://ask.metafilter.com/113554/When-Google-just-isnt-enough#1632194</link>	
		<description>You can find old &apos;htdig&apos; that did remarkable things to text searches (web pages explicitly), but did text also. Sorta same thing as Lucene.  Otherwise I&apos;d have to point you out to crafting your own specific search engine.  Fast computers and regexps are fine if you know them, large data and you craft your own.  Still it turns out to be more &quot;find things with these words and prune the ones they don&apos;t want&quot; and it&apos;s Lucene type stuff with regexp after.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2009:site.113554-1632194</guid>
		<pubDate>Sat, 07 Feb 2009 11:48:00 -0800</pubDate>
		<dc:creator>zengargoyle</dc:creator>
	</item>
	</channel>
</rss>
