I want a text tool that can search documents for phrases within X words of a target.
November 18, 2004 7:23 AM   Subscribe

Text search tool: I'm looking for a word processor or text tool, that will allow me to search documents for phrases centred about particular words. I want to be able to specify a 'window,' say of 15 words; and the tool will then return *all* the text that occurs 15 words either side of the search target. Example: if I searched this AskMe page for "fanboy" as a target, and set my window as '6,' I would expect to get - "aren't trying to show off their fanboy Pitchfork-esque indie creds. The less politics" - from alidarbac's question below, as one of the returns. Free/share/commercial-ware all acceptable. Many thanks!
posted by carter to Computers & Internet (7 answers total)
 
My apologies if you already know this, but the librarian-parlance for this is KWIC which stands for "key word in context" and you see it all the time in big big indicies like Index Medicus. This thing definitely does what you are looking for, but it's an online tool and an old one at that, no idea if it could be custromized to your purposes. I think this thing may do what you are looking for.
posted by jessamyn at 7:56 AM on November 18, 2004


Response by poster: Yes! That's what I'm looking for - but I didn't know it was called "KWIC," which presumably is why it was hard to find things out about it - so thanks, jessamyn!
posted by carter at 8:03 AM on November 18, 2004


You should also try looking for the word "concordance," which should also turn up a lot of scripts and tools used by language geeks (like myself.)
posted by Mo Nickels at 8:19 AM on November 18, 2004


using grep, unix style:

cat $files_to_search | tr '\n' ' ' | egrep -o '([^ ]*[ \t]*){0,6}generation([^ ]*[ \t]*){0,6}'
posted by sfenders at 11:51 AM on November 18, 2004


The grep suggestion is awesome. I am in a sooper hurry, but go to http://slinkages.blogspot.com and scroll down for a link called something like... oh screw it

linux cookbook: analyzing text

Lots of cool stuff, and you can probably install windows versions of these commands. They work in the os x terminal.
posted by mecran01 at 3:26 PM on November 18, 2004


I don't know what sort of geek quotient you posess, but this is the sort of thing Perl and regular expressions were made for.
posted by icey at 4:16 PM on November 18, 2004


Response by poster: My geek quotient is low; my colleague's however is not. I'm initially hoping to experiment with a few off-the-shelf tools, shifting inputs and outputs between them, in order to work towards a basic proof-of-concept for a particular way of analyzing and parsing spoken and written communication (here, the KWIC and concordance tools will be a very good start). I have a big corpus I can play with.

I want to use the proof-of-concept to start spec'ing out a specific tool for a colleague to develop - who, fortunately, is *very* geeky ;) My relationship with him will go better I think if I can point to existing tools and say, "I like what this does here," or, "I wish this would do this rather than this here." Rather than just saying, "Build me something neat."

Anyway, thanks, y'all! This has been very helpful.
posted by carter at 4:34 PM on November 18, 2004


« Older Starters and Finishers   |   Examples of decent web forms? Newer »
This thread is closed to new comments.