RSS Mashup & Duplicate Removal
April 8, 2006 8:01 PM   Subscribe

Looking for a tool that can take multiple RSS feeds, strip duplicate entries and output a single feed. None of the RSS meshing tools I'm seeing seem to offer the duplicate removal. I'm subscribed to searches through 20+ feeds to make sure I catch all available references of a URL on blogs, courtesy of MonitorThis, but the number of duplicates is becoming a big problem.
posted by will to Technology (3 answers total) 1 user marked this as a favorite
 
I agree, this is annoying. Also annoying is having multiple feeds in my subscriptions, each with their own POV, pointing to the same article, and yet not having an easy way to read them in a coherent way.

One of my little learning projects was/is to make a "feed condenser" that could remove dupes from search feeds and also create a summary feed when multiple blogs link the same article. I haven't done anything with it though.

I remain suprised that someone else hasn't solved the problem. Memeorandum kind of does the summarization, but it starts with the feeds that someone else finds important. That already has its own problems for tech news and current events. It doesn't work at all if you are interested in a niche subject.
posted by Good Brain at 11:57 AM on April 9, 2006


This would be a very simple programming task -- the reason someone hasn't done it is probably because there's no demand.

For one thing, there's a difference between one person saying "Look at this article! How dare they say that!" and another saying "Check this article out, they are so right!" i.e. there's a context to the link which most people probably want.

For another, there might be a problem with deciding what exactly is the same URL -- the analysis and re-writing of New York Times URLs alone is a subject you could write a book about.

And thirdly, which one should "win" when you have two or more stories linking to the same URL? The first? The last, or don't you care?

If you can give me a URL which will return all your many RSS feeds, I can write a Perl script which will strip out duplicates and return a single feed with only one item per URL, and I bet lots of other people could so the same in 20 other languages. What kind of a computer do you have, or would this be better as a CGI script you'd run via a browser?
posted by AmbroseChapel at 4:11 PM on April 9, 2006


Best answer: What you are looking for is CaRP. An Excellent piece of software that runs in PHP and MySQL. I am using it on my site here to aggregate 3 feeds into one. It has a bajillion options and works really well. Try the free version, the paid version can filter dups.
posted by Chuck Cheeze at 8:22 PM on April 9, 2006


« Older Strange PC noises   |   Best options for an out-of-country Cal Bar Exam... Newer »
This thread is closed to new comments.