How do I stop my RSS feed from being abused?
March 19, 2007 3:48 PM   Subscribe

The RSS feed from my blog is being copied in full (pictures and all, hot linking no less) to someone else's blog. The blog is is clearly a spam blog, harvesting hundreds of feeds and republishing them in full. The whois for this site is not helpful, what if anything should I do?

The blog in question is livelonely.com (my site is blog.thesietch.org) seems it harvests from many many otehr blogs. I really would rather not have my content being republished on such a crappy site, specifically because I run a full feed.

Should I make them stop? If I did want to make them stop how can I find out who this person is, and make them stop?
posted by stilgar to Computers & Internet (34 answers total) 12 users marked this as a favorite
 
Put a block on their IP so they can't grab the feed.
posted by phrontist at 3:52 PM on March 19, 2007


There are a ton of these spammy aggregator blogs out there. Finding and stopping the person may be hard-to-impossible; unless you think you stand to lose significant money on this, trying to track them down is likely not worth your while.

Like phrontist says: cut them off from your end. It's a reactive game, and you're pretty much stuck on defense.
posted by cortex at 3:56 PM on March 19, 2007


You could set up your site to redirect them, based on their IP address to another file. I'd imagine you could create a fake newsfeed that has entries like "livelonely.com is a spam blog that steals other people's content" or worse.
posted by i love cheese at 3:59 PM on March 19, 2007 [1 favorite]


If they're indiscrimately taking code from your site, you can subjugate theirs! Put this in script tags and you can replace their page content!
if(window.location == "http://www.livelonely.com/") {
document.write("All Your Base Are Belong To Us");
}
Or you could just redirect to your page:
if(window.location == "http://www.livelonely.com/") {
window.location = "http://blog.thesietch.org/";
}

posted by phrontist at 4:07 PM on March 19, 2007 [2 favorites]


Oh, and in case it's not obvious, you need to put this in a post - it shouldn't effect your site, only theirs.
posted by phrontist at 4:14 PM on March 19, 2007


Isn't the whole point of RSS (Really Simple Syndication) to have people grab your feed and syndicate your content however they want? If you don't want your content syndicated, you probably shouldn't RSS it. :)
posted by afx114 at 4:24 PM on March 19, 2007


Huh? IP blocking? That almost certainly won't work. I mean, what makes you think they are pulling the RSS from an IP that is even remotely similar to livelonely.com's? Are you reading Ask Metafilter from the same machine your blog is hosted at? Probably not.
posted by dendrite at 4:29 PM on March 19, 2007


dendrite: Um, it's highly likely that the software they are using is running on the same server that hosts the page. They certainly aren't sitting their with a feed reader copying and pasting!

Even so, it's not hard to figure out the IP address their culling from (however unlikely that may be). Just slide some code into you feed generating page to put an HTML comment in every post:

< !--br> 234.34.55.78
0-->

Where the IP is that of the requester. Then wait...
posted by phrontist at 4:34 PM on March 19, 2007


Most likely if their site is dynamic and pulling directly from stilgar's blog, IP blocking would work and would be very simple -- they probably are pulling it directly from their site. If they're statically publishing it, then it'd be a matter of watching for weird regular traffic by turning on full logging. You'd accidentally ban anyone who has a RSS reader running at an interval, but it might be worth it in the short term.
posted by mikeh at 4:36 PM on March 19, 2007


dendrite: tacking the requesting IP onto the post body for a few hours will soon track down the offending client.

phrontist: I personally love the script idea, but it won't help with search engine indexing, and it's easily circumvented.
posted by Leon at 4:38 PM on March 19, 2007


Good point about the requesting IP on the post body, but I'm still interested to find out if the offending client matches the offending host.
posted by dendrite at 4:43 PM on March 19, 2007


Response by poster: Update: they are using a wordpress blog with the wp-autopost plugin, basically it takes rss feeds (for instance mine from feedburner) and converts them to posts.

I don't need to track them down, every single post i make generates a trackback ping in a matter of seconds that shows up in the comments section of my blog (with the ip) I am thinking about using mod rewrite in htaccess to simply block the ip, but i need to make sure its static first. I was hoping for a way to figure out who these people were.

As for the rss being for syndication, this is not what I would consider syndication this is a blog that simply takes hundreds of feeds strips the ad's out of them and then posts them on livelonely.com surrounded by lots and lots of ad spam. I have no problem with people highlighting my stories and linking back to me, but fully re-posting them and then making money off of them seems wrong.
posted by stilgar at 4:47 PM on March 19, 2007


@afx114

This is very obviously different. As a blogger, you provide RSS for the convenience of your readers.

Using your content as a mechanism to make money or manipulate search engine rankings is not legitimate. The choice to provide RSS content does not invite or authorize such usage.
posted by sindark at 4:48 PM on March 19, 2007


Response by poster: Not to mention he has the gaull to hotlink all my images, costing me bandwidth...bah.
posted by stilgar at 4:49 PM on March 19, 2007


I get this problem all the time with my blog and despite what afx114 suggests, publishing an RSS feed is not an invitation for someone to scrape your entire blog and wrap their own ads around it. I use my feed for an e-mail newsletter, to aggregate into content networks, and am happy for people to excerpt, but unfortunately others just take the whole thing.

I always whois the domain, look at the name servers to figure out who the host is, then e-mail the sales department (since it's the most checked) of the host to tell them that a site they're hosting is scraping my content. If there's a for-real e-mail in the whois, i contact the domain owner as well. It's worked every time for me.

Another thing to try is to automatically add a line at the end of each post in your feed that says something like "Read the rest of my blog at x.com". If you're using Wordpress 2.0 the Feedvertising plugin will do the trick.
posted by ukdanae at 4:50 PM on March 19, 2007


goatse.cx

perhaps throw in some tubgirl
posted by caddis at 4:54 PM on March 19, 2007 [1 favorite]


When this happened to me, I filed a DMCA notice with the ISP that was hosting the spam blog (which incidentally, also was The Planet, the folks who host livelonely.com). My content was removed from the offending site within 48 hours (though the spam blog is sadly still there).

The DMCA is fairly specific about what constitutes a correct notice (though Wikipedia's article on the OCILLA is a good resource). My email is in my profile, if you'd like it, I'll send you a copy of the notice I sent (which was based on the samples found here).
posted by toxic at 4:55 PM on March 19, 2007


The standard redirect for spammers or other offensive wankers (as opposed to the clueless who get something less offensive) hotlinking your images is goatse.cx
posted by Mitheral at 4:57 PM on March 19, 2007


This is my great new article.

<!-- < % $remote_addr %> -->

It is not a trap! I am not planning to use this IP to make a fake rss feed that is nothing but links to links to White Power organizations.

Note: adjust exact syntax depending on what silly web application language your blog uses.
posted by PEAK OIL at 5:29 PM on March 19, 2007 [1 favorite]


I've had this happen. I found out about it when I found a blog that was linking to one of my entries...except the entry was on a RSS scraper blog. The blog itself had no contact info, but I found its host (which had plenty of warnings about copyright) and told them about it. They shut the blog down.

So just ask their host and they'll do something, hopefully.
posted by divabat at 5:38 PM on March 19, 2007


So just ask their host and they'll do something, hopefully.

I can tell you from experience, that the host in question (The Planet, AKA Everyones Internet) will not do anything (except ask you if you intended to file a formal DMCA complaint).

In their eyes, a spam blog is a paying customer.
posted by toxic at 5:45 PM on March 19, 2007


While it might be emotionally satisfying to send them the goatse.cx picture, or to put "all your base" onto their site, it's also an empty victory.

Their spam-blog doesn't exist for humans to look at. They're stealing your content so that their site seems to update regularly with material that the googlebot will think is real -- as, indeed, it is. The spam-blog is there, and only there, to be read by the googlebot and the ask-com spider and the msn-bot and the like.

I don't know exactly how it would work, but the real revenge would be to put a "META" into the site that told crawlers to ignore it. Perhaps others watching this thread would know more about that. If that could be made to work, it would deprive the owner of the spam blog of the benefit they seek from it.
posted by Steven C. Den Beste at 5:48 PM on March 19, 2007


FWIW, here's some quick .htaccess code to help with the image hotlinking. Stick this in an .htaccess file in the directory you want protected:

<Files .htaccess>
Order Allow,Deny
Deny from all
</Files>

RewriteEngine On
RewriteCond %{HTTP_REFERER} ^(http://)?(www\.)?their-domain-name.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^(http://)?(www\.)?any-other-domain-name(s).*$ [NC,OR]

posted by Hankins at 5:55 PM on March 19, 2007


Response by poster: Steven your idea gave me an idea, I contacted Google about an abuse of their ad policy, running Google ad's on a spam blog is a no no, it wouldn't make me as mad if i knew this person was not going to make money off of anyone else's hard work
posted by stilgar at 5:56 PM on March 19, 2007


A different possibility is to send them a long file full of words like "viagra" and "xanax" and "texas-holdem" and all the other wonderful spam terms we've all gotten to know and hate. By now I would guess that the googlebot et. al. are sensitized to ignore pages which contain excess amounts of those words, given how much they're abused.

If you did that, then instead of your RSS feed being used to convince the googlebot that the spam-blog is "normal", your RSS feed would convince the googlebot that the spam-blog is, indeed, spam. And thus you would deprive the spam-blog owner of the benefit he seeks.
posted by Steven C. Den Beste at 5:56 PM on March 19, 2007


Agreed with SCDB - sending them dirty pictures is only a temporary annoyance. If you really want to screw them, use mod_rewrite to serve them a ton of links to sleazy link farms and black-hat SEOs. They will be quickly blacklisted by Google, which is a tar pit they won't easily get out of. Just be careful you don't accidentally serve the same content on your own page!
posted by pocams at 5:57 PM on March 19, 2007 [1 favorite]


It would be better to include those things into a bogus RSS feed than to muck with mod_rewrite, because the bogus RSS feed would look native to the googlebot.
posted by Steven C. Den Beste at 6:16 PM on March 19, 2007


Isn't the whole point of RSS (Really Simple Syndication) to have people grab your feed and syndicate your content however they want?

No. You may think that the spirit of the general RSS revolution is such, but each site owner is free to publish terms of use for their content feeds. This may include attribution, links to the publisher's site, etc. I can't think of much of anyone who publishes an RSS feed so that any spammer under the sun can create a "content site" with no attribution, no linking, nothing to credit or acknowledge the originator of the content.
Not everyone is hung up on how their feed gets used, but it's hard to defend "re-blogging" of content in this way. It just adds noise to the web.

Truth be told, most people publish RSS feeds in the hope that people may plug them into their readers and become daily visitors to the originating site. But the downfall of RSS is that it's just as "really simple" for a re-blogger to abuse the content as it is for any single reader to subscribe on a daily basis.

I'd consider publishing terms of use in the feed itself and on your site. You can then file a copyright or DMCA complaint against the abuser, involving their ISP if necessary. You should also consider some of the solutions offered here, which will allow you to block or otherwise fuck with the offending site.

Try doing what SomethingAwful does when someone hotlinks to one of their images: swap it out with a giant picture of a hermaphrodite taking a shit on a coffee table, headline: "I LIKE TO STEAL BANDWIDTH."
posted by ahilal at 7:24 PM on March 19, 2007


As evil as the DMCA can be when misused, this is exactly what it's for. Here's PlagiarismToday's guide to combatting exactly this sort of thing, complete with sample notifications.

(They also regularly run articles about splogs and Internet plagiarism concerns in their main blog. Good reading that might interest you.)

On preview: You don't need to publish terms of use before filing a DMCA notification. Unless you've granted a license, the splogger has no right to republish the material. Copyright defaults to "not allowed".
posted by mendel at 8:35 PM on March 19, 2007


I see that your blog is hosted on an Apache server. Here is an entry I blogged about how I fought off an RSS scraper using Apache mod_rewrite: http://www.unicom.com/chrome/a/001233.html
posted by chipr at 6:55 AM on March 20, 2007


A DMCA takedown notice to the ISP will be a good start, even if it goes against your principles. Why should the RIAA have all the fun?
posted by KRS at 10:52 AM on March 20, 2007


Try doing what SomethingAwful does when someone hotlinks to one of their images: swap it out with a giant picture of a hermaphrodite taking a shit on a coffee table, headline: "I LIKE TO STEAL BANDWIDTH."
Then you hit things like del.icio.us too, where people are saving things because they’re great, and then when they come back to look at the same interesting thing a few days later, they get a picture of a leech captioned in Dutch. Even when they specify referers on the same server. If bookmarking an image because you like it isn't a legitimate thing to do, I’m not sure any bookmarking is.
posted by Aidan Kehoe at 11:23 AM on March 20, 2007


I'm a lot more selective than that. Specific malbehaving referrers get the image not everything without my referrer.
posted by Mitheral at 8:38 PM on March 20, 2007


Then you hit things like del.icio.us too, where people are saving things because they’re great, and then when they come back to look at the same interesting thing a few days later, they get a picture of a leech captioned in Dutch.

It depends on how you setup your .htaccess. You can set things up so that direct links to jpgs work fine, but hosting them on other pages does not.
posted by delmoi at 12:10 PM on August 21, 2007


« Older Reading for the snatched minutes before the baby...   |   Am I too crazy to be in a relationship with a... Newer »
This thread is closed to new comments.