Natural Language Processing, relative basis for search.
August 8, 2009 4:05 PM
Subscribe
Natural Language Processing (NLP) Filter. I'm looking for a method (or category of methods) to judge the relevancy of a small unit of text, 100-200 chars, against similar units of text.
I have a large set of these textual units, I'm trying to discover the "relatedness" of my query unit (an item drawn from that set) to any other unit in the set, but judged relatively to the set as a whole.
In other words, I'm not looking for just an ordered list of rankings of my query applied to every document in the set. Rather an ordered list of rankings of my query applied to every document in the set then normalized by the magnitude of my query applied to the set as a whole (perhaps via some averaging function.)
Do traditional search engines (open source, like lucene or xapian) do this already?
What I mean by relatedness is that we are talking about the same things by some, arbitrary, empirical measure. In other words, there is no supervised or unsupervised learning, just some off-the-shelf measure of 'relatedness'.
posted by kuatto to computers & internet (6 comments total)
6 users marked this as a favorite
posted by kuatto at 4:35 PM on August 8