Magical feed services, how do they work?
April 8, 2011 12:46 PM   Subscribe

How do services that create a full-text RSS feed when a site only offers partial feeds actually work? As example,
posted by Chrysostom to Computers & Internet (3 answers total) 4 users marked this as a favorite
Best answer: Sites like that go to the links they find in the partial feed and grab the text. It works like Readability and Instapaper do as far as knowing what part of the page is the article and what's ads/navigation etc.
posted by michaelh at 12:52 PM on April 8, 2011

Best answer: You can kind of get a sense for this by using the RSS feed scraping feature (it's not called that; can't remember that module's name off the top of my head) in Yahoo! Pipes. It lets you specify an RSS feed or website link to search within, then add certain opening and closing tags to look for text in between (e.g., everything between <div class="blogpost"> and </div>), among other options.
posted by limeonaire at 1:49 PM on April 8, 2011

Yes, like limonaire said, Yahoo Pipes can do essentially the same thing if you need to fine-tune one that doesn't work on the automatic services. You want to use a Loop module with a Fetch Page module inside, and output the result to the Description field.

That works even better sometimes when you want to add something onto the URL (like, say, &pagenumber=all, or &page=printable). You can just put that step earlier in the pipe. I have a ton of these.

FYI - doing this with the New York Times appears to bypass the new paywall limits.
posted by timepiece at 7:18 AM on April 13, 2011

« Older Will my old pump organ send me falling to my death...   |   Anxious like a mofo over new relationship. Please... Newer »
This thread is closed to new comments.