How to get an RSS feed from a website that doesn't have an RSS Feed.
October 28, 2009 9:40 AM   Subscribe

Setting up an RSS feed for a job board that does not have an RSS feed subscription. Is there a way to do this?

I would like to add an RSS feed to my iGoogle pages for USAJOBS.GOV searches that I have set up. USAJOBS.GOV allows you to subscribe to searches and receive them by email, but does not have a way to subscribe via RSS.

I went to feedage.com, which has a tool that they call HTML2RSS, which is supposed to convert any HTML page to an RSS feed, but it doesn't work very well on USAJOBS.GOV.

ANy ideas
posted by gm2007 to Computers & Internet (9 answers total) 4 users marked this as a favorite
 
I think yahoo pipes might work for this.
posted by rdurbin at 9:54 AM on October 28, 2009


You may be able to use an email-to-RSS service like this one (I haven't used it personally).
posted by null terminated at 9:59 AM on October 28, 2009


Page2RSS could possibly work for you
posted by deezil at 10:09 AM on October 28, 2009


X Fruits might works as well.
posted by chasles at 10:18 AM on October 28, 2009


In this day in age, most database driven pages without RSS are intentional, so automated tools like HTML2RSS generally fail (as it should). However, with government procurement, you never know why things are the way they are. Either way, you'll have to go beyond the easy 1 click approaches. You'll likely need to know how to write a program; I imagine there's books on the subject of Web Scraping that will be of use to you, but I don't own any.

First, take a look at the page source and try to figure out where your data is coming from. Embedded HTML table? Javascript request for XML or JSON? Once you've figured out where the data is, you need to write a filter to request only the data you need.

Once you have the filter, you need to figure out how to divide data into individual records and subdivide that into fields. With HTML tables it's usually a table row and table data tags (<tr> <td>field1...</td> </tr>). If your source is XML then it's usually straightforward.

The final step is to map the data you have into RSS schema. Decide which source data populates the field, the link, the date, content etc. Bonus points for a consistent GUID that can handle revising fields. You may end up merging multiple fields here, because RSS is fairly lightweight.

The tools you'll need for these are wget, HTML Tidy, Beautiful Soup, and PyRSS2Gen.
posted by pwnguin at 10:27 AM on October 28, 2009


Scrape 'N' Feed?

"Scrape 'N' Feed is a simple Python wrapper around the PyRSS2Gen module. It implements almost all of the code you need to create RSS feeds out of web pages. All you have to rite is the code that actually does the screen-scraping (and Beautiful Soup makes that easy). It stores feed state in a pickle file between invocations, freeing you from having to worry about most of the minor problems that get in the way of scraping RSS feeds."
posted by brainwane at 10:45 AM on October 28, 2009 [1 favorite]


Much like Pipes, Feed43 will help you create a fairly reliable RSS feed from a site. Be forewarned that to make it work well requires some programming knowledge (regexes, specifically) or a bit of persistence. Having said that, I've created a number of feeds from local bars' calendars to get local events and it's held up over years.
posted by yerfatma at 10:52 AM on October 28, 2009


yerfatma: ""Scrape 'N' Feed is a simple Python wrapper around the PyRSS2Gen module. It implements almost all of the code you need to create RSS feeds out of web pages."

I don't know how I found BeautifulSoup but not Scrape'N'Feed. This sounds like it implements exactly the features that kept me from writing more scripts; flexible archive, caching, expiring items, date guessing and GUIDs. Hot Damn.
posted by pwnguin at 11:55 AM on October 28, 2009


ChangeDetection provides an RSS feed of all pages you're following, as well as the ability to limit notifications to only sizable changes (no difinition of "sizable" provided), only additions/deletions, and additions/deletions of specific text. You can also limit notification to daily/weekly/monthly.
posted by timepiece at 1:47 PM on October 28, 2009


« Older Your honor, the speed limit was too low!   |   Busted by the pen nazi Newer »
This thread is closed to new comments.