Prevent hotlinking without killing images in my RSS feed?
January 11, 2010 7:51 PM   Subscribe

Help! I'm being hosed! But, how do I prevent hotlinking without killing the pictures in my RSS feed?

Sorry if this is a little too geeky for Meta, but googling has (mostly) failed me, and when I look around at people asking similar questions on coder / web design forums they all seem to be pretty harsh places.

You and me may have had our (minor) differences, Metafilter, but I still think you're a nice persons!

So... I have a website. Some lovely people on a load of chinese forums have started hotlinking to entire rather large galleries on said website. While my host allegedly allows unlimited bandwidth, I fear that a couple of hundred GB's a day may be enough for them to start getting arsey with me.

I've added some stuff to htaccess. Now in place of a hotlinked image, users are served a picture with a snarky message asking them to visit the site instead. The problem with this is, my RSS subscribers are now also served this image in place of the pictures in my feed.

I know how to allow specific sites to hotlink, say just Google Reader or whatever, but what I'd really like to be able to do is add a rule using wildcards saying that any referrer that has "reader", "feed" or "RSS" somewhere in the URL can get images, and then also perhaps instead of having to work out the specific addresses every reader uses I can just add "yahoo", "google", "netnewswire" with some wildcards around them and go with that.

Does anyone have experience with this and know what the syntax is? Is this something I should not do because excessive wildcard usage will slow things down or something? Is there a more elegant solution? Blocking specific URL's will unfortunately not work for me because the hotlink rapeage is coming from a bewildering array of domains.

For your reference, here's the code I'm using in my htaccess right now:

RewriteEngine On
RewriteCond %{HTTP_REFERER} !^http://(.+\.)?mydomain\.com/ [NC]
RewriteCond %{HTTP_REFERER} !^$
RewriteRule .*\.(jpe?g|gif|bmp|png)$ http://mydomain.com/images/nohotlink.jpe [L]

Thanks!
posted by TheTorns to Computers & Internet (20 answers total) 6 users marked this as a favorite
 
I don't know a better solution, but I think you will definitely miss some reader applications and block legit users, just so you're aware of that.

Don't forget to add whatever referers firefox and safari use.
posted by jjwiseman at 8:01 PM on January 11, 2010


Response by poster: I should perhaps clarify that what I'd really like is for someone to help me out with the syntax for using wildcards here. Whenever I try it seems to break stuff. D~:

Any help super appreciated!
posted by TheTorns at 8:05 PM on January 11, 2010


Instead of matching the referrer, you can match the image URL, and add, say '?via_rss', to the ends of image URLs in your RSS feed. Someone intent on hotlinking can easily bypass this, but it should be enough to stop casual forum posters who found your image on Google, and no RSS readers get blocks.
posted by domnit at 8:26 PM on January 11, 2010


Response by poster: That sounds great domnit, but I must have given the impression I am smarter than I am - because I really don't know how to do that. X~D

Any chance you could explain in more detail?
posted by TheTorns at 8:31 PM on January 11, 2010


This may be an opportunity to use blacklisting rather than whitelisting.

Which is to say, rather than saying "anything that *doesn't* match this, send the hotlink image", change it to "anything that does match this, send the hotlink image". You say that they are coming from a bewildering number of domains, but it may be a more limited range of IPs. At the very least you may be able to confine it to just a few. So then you can write code like this:

RewriteEngine On
RewriteCond %{REMOTE_ADDR} 123.(12|34|56).[0-9]+.[0-9]+ [OR]
RewriteCond %{REMOTE_ADDR} 121.[0-9]+.[0-9]+.[0-9]+ [OR]
RewriteCond %{REMOTE_ADDR} 120.(56|78|90|101).[0-9]+.[0-9]+
RewriteRule .*\.(jpe?g|gif|bmp|png)$ http://mydomain.com/images/nohotlink.jpe [L]

posted by Deathalicious at 8:41 PM on January 11, 2010


Response by poster: Thanks Deathalicious!

So am I right in thinking that the first line is saying "block 123.(12 or 34 or 56).*.*

And the next one is more like 121.*.*.*

?

Whatever variety of wildcarding is being used here is not one that's familiar to me I'm afraid.
posted by TheTorns at 8:49 PM on January 11, 2010


Sorry, Torns, my idea would require you to modify or extend your blog software--you'd have to somehow append some tag to image URLs, but only in the feed.

And if you're up for a headache, the wildcard things you refer to are called regular expressions.
posted by domnit at 9:12 PM on January 11, 2010


Response by poster: Oh! I do have some experience with regex actually, I just didn't recognize that was what was being used there. Knowing that it's regex I might be alright, but It's still my preferred solution, new domains are popping up all the time as we speak.

Oddly enough I'm using regex via yahoo pipes to strip out some propietry markup for images from my feed and replace it with common or garden html, so I could perhaps quite easily append something there if I had to. I just don't quite understand how I would match the image URL in htaccess, and then how I would get htaccess to recognize the ?via_rss bit and let it through...
posted by TheTorns at 9:28 PM on January 11, 2010


Response by poster: Whoops! I meant to say above that it's still /not/ my preferred solution, in reference to blacklisting IP's or domain names. Sorry about that.
posted by TheTorns at 9:36 PM on January 11, 2010


Everything you need is documented thoroughly on the apache site: mod_rewrite. Everything after the ? in the URL is part of ${QUERY_STRING}, so you have to use that in your rule if you want to write such a condition.
posted by Rhomboid at 9:37 PM on January 11, 2010


RewriteEngine on
RewriteCond %{HTTP_REFERER} ^http://(forum\.)?badsite.com/.*$ [NC]
RewriteRule \.(gif|jpg|png)$ http://www.yoursite.com/hello.jpg [R,L]

posted by Nameless at 10:11 PM on January 11, 2010


Response by poster: Here's what I'm going with for now:

[code]
RewriteEngine On

RewriteCond %{HTTP_REFERER} !^http(s)?://(.+\.)?mydomain\.com/ [NC]

RewriteCond %{HTTP_REFERER} !^.feed. [NC]
RewriteCond %{HTTP_REFERER} !^.google. [NC]
RewriteCond %{HTTP_REFERER} !^.read. [NC]
RewriteCond %{HTTP_REFERER} !^.rss. [NC]
RewriteCond %{HTTP_REFERER} !^.zilla. [NC]
RewriteCond %{HTTP_REFERER} !^.yahoo. [NC]
RewriteCond %{HTTP_REFERER} !^.news. [NC]
RewriteCond %{HTTP_REFERER} !^.opera. [NC]
RewriteCond %{HTTP_REFERER} !^.pipes. [NC]
RewriteCond %{HTTP_REFERER} !^.space. [NC]

RewriteCond %{HTTP_REFERER} !^$

RewriteRule .*\.(jpe?g|gif|bmp|png)$ http://mydomain.com/images/nohotlink.jpe [L]
[/code]

I'm thinking that might have me covered for the moment. Domnit's solution sounds a lot more elegant though!
posted by TheTorns at 12:02 AM on January 12, 2010


That doesn't really make sense. "." means any single character, and ^ means anchor to the beginning of the field, so those would not match e.g. www.google.com. If you intend to match a literal period you need to use "\." and not anchor the match to the beginning; if you intend to match one or more characters you need ".+".
posted by Rhomboid at 12:08 AM on January 12, 2010


Response by poster: Crikey, yes. Thanks for spotting that Rhomboid.

Here's what I replaced that with:

RewriteCond %{HTTP_REFERER} !^.?feed.? [NC]
RewriteCond %{HTTP_REFERER} !^.?google.? [NC]
RewriteCond %{HTTP_REFERER} !^.?read.? [NC]
RewriteCond %{HTTP_REFERER} !^.?rss.? [NC]
RewriteCond %{HTTP_REFERER} !^.?zilla.? [NC]
RewriteCond %{HTTP_REFERER} !^.?yahoo.? [NC]
RewriteCond %{HTTP_REFERER} !^.?news.? [NC]
RewriteCond %{HTTP_REFERER} !^.?opera.? [NC]
RewriteCond %{HTTP_REFERER} !^.?pipes.? [NC]
RewriteCond %{HTTP_REFERER} !^.?space.? [NC]
posted by TheTorns at 12:41 AM on January 12, 2010


Response by poster: Actually, on reflection:

RewriteCond %{HTTP_REFERER} !^.+feed.+ [NC]
RewriteCond %{HTTP_REFERER} !^.+google.+ [NC]
RewriteCond %{HTTP_REFERER} !^.+read.+ [NC]
RewriteCond %{HTTP_REFERER} !^.+rss.+ [NC]
RewriteCond %{HTTP_REFERER} !^.+zilla.+ [NC]
RewriteCond %{HTTP_REFERER} !^.+yahoo.+ [NC]
RewriteCond %{HTTP_REFERER} !^.+news.+ [NC]
RewriteCond %{HTTP_REFERER} !^.+opera.+ [NC]
RewriteCond %{HTTP_REFERER} !^.+pipes.+ [NC]
RewriteCond %{HTTP_REFERER} !^.+torn.+ [NC]
RewriteCond %{HTTP_REFERER} !^.+space.+ [NC]
posted by TheTorns at 12:46 AM on January 12, 2010


"?" means "zero or one" so ".?" means either nothing or any one character. I still don't think that's what you want, as that still wouldn't match www.whatever. If your intent is to match "any referrer that contains the string .google." then you want "\.google\.", and the ! in front inverts the logic, so "RewriteCond %{HTTP_REFERER} !\.google\. [NC]" means "the following RewriteRule applies unless the referer contains the string '.google.', non-case sensitively."
posted by Rhomboid at 12:48 AM on January 12, 2010


Oh, and if you just want to match any string with "google" then there's no need for anything else: RewriteCond %{HTTP_REFERER} !google [NC]
posted by Rhomboid at 12:50 AM on January 12, 2010


You want to display the correct image to users with an empty referrer or a referrer which matched your site url.
posted by devnull at 12:54 AM on January 12, 2010


Response by poster: Thanks for the help guys!

Here's what I'm rolling with:

RewriteEngine On

RewriteCond %{HTTP_REFERER} !^http(s)?://(.+\.)?mydomain\.com/ [NC]

RewriteCond %{HTTP_REFERER} !feed [NC]
RewriteCond %{HTTP_REFERER} !google [NC]
RewriteCond %{HTTP_REFERER} !read [NC]
RewriteCond %{HTTP_REFERER} !rss [NC]
RewriteCond %{HTTP_REFERER} !zilla [NC]
RewriteCond %{HTTP_REFERER} !yahoo [NC]
RewriteCond %{HTTP_REFERER} !news [NC]
RewriteCond %{HTTP_REFERER} !opera [NC]
RewriteCond %{HTTP_REFERER} !pipes [NC]
RewriteCond %{HTTP_REFERER} !space [NC]

RewriteCond %{HTTP_REFERER} !^$
RewriteRule .*\.(jpe?g|gif|bmp|png)$ http://thetorns.com/images/nohotlink.jpe [L]

Seems to be working!
posted by TheTorns at 1:03 AM on January 12, 2010


OK, so for my solution the htaccess part should be:

RewriteCond ${QUERY_STRING} !via_rss

after the RewriteCond lines you already have.

As for rewriting the URLs, that would be an actual programming task--maybe Yahoo Pipes would work.

Periodically check for sites scraping your feed and individually block those.
posted by domnit at 9:57 AM on January 12, 2010


« Older how to make chocolate covered pretzels from...   |   Maybe there was no need to hitchhike? Newer »
This thread is closed to new comments.