Help Me Build a Custom FriendFeed
June 10, 2010 11:13 AM   Subscribe

What's the best way, using PHP, to monitor an RSS feed, grab every new item posted, and then store parts of each item in a MySQL database?

Here's what I'm trying to do...

I have all of these various RSS feeds that show what I'm doing with my life: Netflix, Flickr, Twitter, Last.FM, Vimeo, etc. What I want to do is build a webpage that shows all of my posts to each of these sites, organized by day, listed chronologically. The page would simply read from a MySQL database, but the tricky part is getting the data into the database in the first place. I'm kinda wanting to build a custom FriendFeed, I guess.

I know how to scrape pages using PHP, but how can I monitor the rss feeds in real-time?

Or is there a cleaner way to do this all together, without any page scraping involved?
posted by JPowers to Computers & Internet (10 answers total) 1 user marked this as a favorite
 
Best answer: As it's RSS you can create a SimpleXML object for each incoming feed, then interrogate them to extract whatever data you need.

To schedule the running of this script you could create a cron job, assuming you're running on a Linux server and have shell access.
posted by sandig at 11:32 AM on June 10, 2010


SimplePie + an aggregator style theme with Wordpress?
posted by ceri richard at 11:39 AM on June 10, 2010


In my mind, screen scraping is something one does to (potentially ugly, malformed) html. Since RSS is XML (and all those services are likely to give you valid RSS), no screen scraping is required. Just use one of PHP's xml extensions to process the data. SimpleXML is probably all you need.

Or maybe you are asking about setting up a cronjob to periodically fetch the data? (That's going to be dependent on your hosting setup.) Or a good way to separate what you've already pulled? (Key off the link or guid elements if you're sure you only want new stuff and aren't worried about updated entries. Using the channel's lastBuildDate would be better, but I'm not sure all your feeds will include that info.)
posted by and hosted from Uranus at 11:41 AM on June 10, 2010


You can't really do this in real-time. Nobody does. You can set a polling period like every 5 minutes or something though. Keep the ID or timestamp or whatever of the last object you fetched from each feed, and then only store items newer than that.

It looks like there's already several RSS libraries for PHP, so you probably want to take a look at those.

1 2 3
posted by kmz at 11:43 AM on June 10, 2010


This is just lifestreaming. There are several lifestream plugins for WP that purport to do this.

As for real time, see PubSubHubBub to learn about how to make this faster. The intro video is pretty good. I notice my Google Reader picks up on some of the feeds I read this fast.
posted by artlung at 11:50 AM on June 10, 2010


Note that many feeds specify a minimum refresh period, typically 45 minutes. It's kind of a dick move to hit a feed any more frequently than the admin wishes, because it can drain a lot of server resources if everyone were to do that.

You don't necessarily need shell access to set up cron jobs. Shared hosting software like cpanel lets you set up cron jobs from the web interface, if the admin has allowed it. And if the command line version of PHP is not available on the server, then you can use wget or curl (or even a 'perl -e' oneliner with LWP in a pinch) in the cron job to hit a specified URL that leads to the script being executed. Note that you'd probably want to put some access control on the script (e.g. through htaccess) so that it's only accessible through the local interface so that there's no way to trigger it remotely.
posted by Rhomboid at 12:09 PM on June 10, 2010


I have been thinking about doing this exact thing with Drupal + Activity Stream (though I have not implemented it yet).
posted by Famous at 12:31 PM on June 10, 2010


I have done this for my website, and it does not need MySQL. What you are looking for is called Planet. I use the Planet Venus fork, which has seen a flurry of activity lately. The design is simple: you give it a configuration, and an output theme. There are several to choose from, but I chose to write my own using Django templates. I kept some notes on the design and ideas, as a sort of documentation for me and other interested parties.

You have to set up a cronjob to monitor this as there's no such thing as "realtime RSS". All clients must poll the source, and well behaved ones respect the source's polling rate and cache results to reduce traffic. Pubsubhubbub may be available on a case by case basis, but since your feeds are specific to you, I doubt you'll find anything. My site updates every hour. Most importantly, the output is static, so lots of traffic won't slow you down or consume tons of RAM.

As far as content goes, I drop my AskMeFi questions, blog posts, photo gallery posts, and positively rated Stackoverflow/Serverfault questions on there. I also have a feed for Xbox live, but I need to make it less spammy.
posted by pwnguin at 2:42 PM on June 10, 2010 [1 favorite]


Yeah, if you're going for a 'planet-like thing' but want to use PHP, etc., there's Managing News (caution, caution, self-link, but it's good stuff)
posted by tmcw at 3:59 PM on June 10, 2010 [1 favorite]


You can build it with Drupal (or the Drupal-based Managing News) but for a simpler solution, there's always Gregarius, which will do it out of the box, but won't let you do anywhere near as much stuff with Drupal. Planet works the same way as Gregarius.
posted by Brian Puccio at 7:20 AM on June 12, 2010


« Older Call it fate, call it kismet, but please call me...   |   How is having a second child different from having... Newer »
This thread is closed to new comments.