HTML markup appearing in RSS feeds?
December 10, 2008 9:40 AM   Subscribe

How to prevent HTML markup from appearing in my RSS feeds?

I'm being driven absolutely mad by several RSS feeds (which I view through Google Reader) that contain HTML markup. Some feeds are affected, others not at all. It's pretty bad on the Best of Craigslist feed, but it happens on others as well.

Screenshot here.

Is there anything I can do about this? Bonus points if the answer isn't "stop using Google Reader."
posted by Brian James to Computers & Internet (5 answers total) 1 user marked this as a favorite
 
The problem is probably in the source document - the xml file being provided by the source web site. You can complain to the source about the quality of the free service that they are providing. You cold try a different reader, maybe read the problems feeds in Thunderbird as I believe you can specify in Tbird that the feed be displayed as text, which might strip out the offending tags.
posted by COD at 10:12 AM on December 10, 2008


It's not Google's fault. In this case, the craigslist feed is converting the left and right tag brackets to their character entities & lt; and & gt; (remove the spaces, of course).
posted by phrayzee at 10:19 AM on December 10, 2008


Entity-encoded HTML is permitted in the <description> sub-elements of an RSS feed's <item> elements. If you want to clean out HTML tags from a feed, you could set up a Yahoo Pipes filter for that feed using the Regex operator to apply a rule like </*[^>]+> to the item:description element.
posted by nicwolff at 10:29 AM on December 10, 2008


Or, maybe not - looks like the Craigslist has blocked Yahoo Pipes. Assholes.
posted by nicwolff at 10:36 AM on December 10, 2008


That's okay. Pipes sucks and would probably not run your regex correctly anyways.
posted by pwnguin at 2:16 PM on December 13, 2008


« Older Help me sniff out the right wine aroma kit!   |   How should I clean my Granite Tranformations... Newer »
This thread is closed to new comments.