Advanced text searching (preferably for Mac)?
January 11, 2010 7:47 AM   Subscribe

Are there any products (preferably that work on OS X, but I'm flexible) that provide really intelligent text searching capabilities? For example, if there's a document that has the word "disociated", I'm looking for software which, if I search for "dissociation", will be able to recognize "disociated" as a misspelled variant of "dissociation", or better yet, of synonyms of "dissociation". Neither Spotlight nor products like Zotero or Devonthink do this. Anything like that out there?
posted by shivohum to Technology (7 answers total) 4 users marked this as a favorite
 
I'm not aware of any such products, but can tell you that doing that sort of analysis on a search term is, in academic research, part of the field of natural language programming (as it relates to searching and indexing). Perhaps google for products that advertise 'natural language processing' or NLP.

Also, there's a specific term for the way that "disassociated" is related to "disassociation" (cognate?). If you can find that term, it's likely used in the advertising for the product, since this is obviously a neat feature.

However, don't be surprised to not find a desktop product that will do it. NLP is a hard problem in artificial intelligence in part because it involves either a lot of processing power or large dictionaries/lookup tables or both. Google can experiment with doing it because a search engine environment has massive economies of scale for doing linguistic analysis; a desktop product doesn't.
posted by fatbird at 9:53 AM on January 11, 2010


There is agrep.
posted by paulg at 9:55 AM on January 11, 2010


Perhaps google for products that advertise 'natural language processing' or NLP.

Googling for "NLP" will get you lots of stupid neuro-linguistic programming hogwash, unfortunately.

What you want is a search tool with a stemmer. "Dissociated" and "dissociation" aren't "synonyms"--they have the same stem.
posted by Sidhedevil at 10:14 AM on January 11, 2010


you might look into products that support "soundex" searches. this won't help at all with searches across synonyms, but is more powerful than simple stem searches. in effect, you ask the search tool for "words that sound like dissociate". this will pick up all sorts of misspellings.

I have no specific product recommendation. Most of the higher end database systems include soundex support, but there may be simple/cheap/standalone tools out there.
posted by lex mercatoria at 10:43 AM on January 11, 2010


You need a soundex AND a stemmer, because if you're searching "bring" you want it to turn up both "brinng" and "brought".
posted by Sidhedevil at 11:06 AM on January 11, 2010


You need a soundex AND a stemmer, because if you're searching "bring" you want it to turn up both "brinng" and "brought".

And I want a synonym search too -- it would be great if it could find "carry" as well. Though maybe that's asking for too much.
posted by shivohum at 12:35 PM on January 11, 2010


And don't forget automatic acronym expansion...that is another neat trick. I know server based products (for workgroups/departments/enterprises) are starting to include stemming and acronym support. The product that I am aware the most, Microsoft SharePoint includes stemming out of the box and you can train it with synonyms. I have no idea if this is trickling down to the desktop yet.

FWIW, automatic synonym searching for English is crazy hard because lots of words have multiple divergent meanings and thus divergent synonyms.
posted by mmascolino at 2:42 PM on January 11, 2010


« Older Durmstrang greatcoat, hell yeah.   |   Does anyone know who sings the disco song that has... Newer »
This thread is closed to new comments.