RSS reader optimized for Craigslist and Kijiji
October 9, 2008 1:29 PM   Subscribe

Wanted, an RSS reader optimized for Craigslist and Kijiji. Willing to program myself, if necessary.

I'm looking to optimize my use of RSS feeds when perusing online classified adds. I've been using Thunderbird, and I've set a few message filters to save me some time, but it isn't enough. Some of the features I need:
  1. Sophisticated kill scripts, to avoid spam, duplicate listings, and items I'm simply not interested in
  2. Permanent storage of the listing and pictures for market research purposes. This would include monitoring changes, and recording the date of listing removal.
I'm sure there are a million features I haven't thought of, but those two are really key, and they are not at all typical for RSS readers, as far as I can tell.

I'm a junior programmer, but I'm happy to take this on as a project. Hopefully, as a plugin for existing software, or a modification of an existing open source RSS reader. Happier still if somebody has already made it, of course!

And as an adjunct question, I'm very interested in recommendations for ebay software. Again, looking for permanent caching and tracking of auctions. I've heard of some ebay market analysis services, but paying a subscription isn't really an option, unless it is very cheap.
posted by Chuckles to Computers & Internet (9 answers total) 2 users marked this as a favorite
Response by poster: I'm a junior programmer, but I'm happy to take this on as a project. Hopefully, as a plugin for existing software, or a modification of an existing open source RSS reader.

Which is to say, I'm looking for suggested approaches that would get me going quickly. Like, "make a Firefox plugin to do it", or whatever, but getting as specific as you want -- any thoughts should help move the idea along, I think.
posted by Chuckles at 1:32 PM on October 9, 2008

What I would write is an RSS filter rather than a reader, since implementing a whole reader UI doesn't move you towards your stated goals. I would use a basic HTTP server framework and serve the RSS feeds from there. When your program gets a request for the RSS, it would go and fetch the current online feed and kill the entries you don't want any way you like - probably with regular expressions or Bayesian filtering. For analysis, spit out your log of entries into big weekly or monthly RSS files (or a database). If you want to record the date of listing removal, that's a little trickier, since you probably won't see it in the RSS - I would throw together a little script that would run over your stored files or database and check what had gone 404.
posted by pocams at 1:44 PM on October 9, 2008

I played around with Yahoo Pipes and built my own RSS feed and put it in google reader. Works great.
posted by bleucube at 1:45 PM on October 9, 2008

Yahoo Pipes works great for this. Here's a pipe I did for Seattle rental searches. The only problem with Pipes is that the update frequency is slow in my experience. If you are in a very active market, this means that your item may be gone before it shows up on Pipes.
posted by grouse at 1:48 PM on October 9, 2008

I believe the update time that grouse refers to is a result of craigslist's slow RSS, not Pipes.
posted by emptyinside at 2:04 PM on October 9, 2008

If that's true, then my bad. Unfortunately that means you won't be able to speed things up by writing your own code instead of using Pipes.
posted by grouse at 2:07 PM on October 9, 2008

Response by poster: What I would write is an RSS filter rather than a reader

Sounds right, I think. The idea crossed my mind before, but now that you've got me thinking of it more seriously, it could be very effective. I can read repackage the RSS feeds pretty arbitrarily as long as my server is alive. So, I could create feeds for favored search terms, or whatever.. That would help a lot.

Any suggestions for Bayesian filtering tools? Like, any libraries available, or whatever.. I don't think it is the right direction, but it would be interesting to play around with.
The first candidates for killfiling are postings that show up on craigslist and kijiji within a few minutes of each other with the same subject line, and repeated postings of the same subject line over a two or three week period. In either case it is a perfectly valid listing I want to see, but I only want to see it once. Actually.. Maybe that is an ideal application of Bayesian filtering. Just that I need to let the first one through.

For me, Craigslist's feeds are normally pretty reasonable.

I'll take a look into pipes in a few minutes..
posted by Chuckles at 4:26 PM on October 9, 2008

If I can pimp my own project for, I use rss2email for RSS delivery and procmail to do some filtering/routing of CL entries to different IMAP email folders. Presumably you could use procmail to add headers to make it easy for Thunderbird to process them as well.

Advantages that I can see for this approach are (1) everything is free, (2) archiving and searching is handled by Thunderbird which you already have setup and know how to use, (3) you can probably use some other milter or roll your own in the language of your choice if you need something more sophisticated, (3a) AFAIK a milter could also handle killing things you didn't or other special delivery cases.
posted by turbodog at 4:52 PM on October 9, 2008

This sounds neat. Right now I'm simply using Google Reader with a folder of feeds for different search terms. Since I'm primarily using it to watch for rare items (e.g. pa-risc), it works well enough, but I'd love to try anything you throw together.
posted by PueExMachina at 7:10 PM on October 10, 2008

« Older Let sleeping cats lie?   |   IT-based solutions to the energy/climate crisis Newer »
This thread is closed to new comments.