How can I stop referrer log spam?
December 4, 2004 11:09 PM   Subscribe

"Reffy is a Windows-based mass referrer spammer.... Reffy comes with a pre-generated list of 3047 active blog websites...." My blog is on that pre-generated list. These people are spamming my logs and filling my referrer page with crap. Paypal has, unsurprisingly, not responded to reports that they're being used to transfer money to spammers, and .htaccess deny blacklists are no good when your URL is being distributed to spammers all over the place. So, (1) how do I stop them, and (2) what legal recourse would I possibly have to get my cut of the money they're making by putting my website URL in their application?
posted by anonymous to Computers & Internet (11 answers total)
 
They say they use a "custom http header" to avoid having to download the pages. If it's different enough hat you can be fairly certain a regular user wouldn't send it, you could probably use mod_rewrite to check for this header, and return a 403 Forbidden. How this would affect your logs would depend on how they're generated. If they're server logs, the requests will still show up, but they'll show up as errors. If you use a script to generate logs, this shoud prevent them from showing up entirely.
posted by Nothing at 12:16 AM on December 5, 2004


Good question, anon.
Recently two places i visit have been spammed by these type of comments:

Cash Advance Loan - email - url
WE MUST ALL HEAR THE UNIVERSAL CALL TO LIKE YOUR NEIGHBOR JUST LIKE YOU LIKE TO BE LIKED YOURSELF.
- George W Bush, Cash Advance Loan http://www.cashadvance.be
24.11.04 @ 14:52:37
Direct TV - email - url
Hummingbirds never remember the words to songs.
Direct TV http://www.direct-tv-com.com
25.11.04 @ 06:43:15
Video Poker - email - url
I CALL UPON ALL NATIONS TO DO EVERYTHING THEY CAN TO STOP THESE TERRORIST KILLERS. THANK YOU. NOW WATCH THIS DRIVE.
- George W Bush,AUGUST 4, 2002, ON VIOLENCE IN THE MIDDLE EAST... AND HIS GOLF GAME Video Poker http://www.video-poker-com.com
27.11.04 @ 10:57:09
Phentermine - email - url
May a Misguided Platypus lay its Eggs in your Jockey Shorts
Phentermine http://online-prescription-pharmacy.com/Phentermine.htm
29.11.04 @ 05:33:45
poker tables - email - url
Let us beware of saying that death is the opposite of life. The living being is only a species of the dead, and a very rare species. by online poker
30.11.04 @ 03:23:12
online poker rooms - email - url
Science is built up with facts, as a house is with stones. But a collection of facts is no more a science than a heap of stones is a house. by texas holdem poker
30.11.04 @ 08:00:00
empirepoker - email - url
It is necessary to the happiness of man that he be mentally faithful to himself. Infidelity does not consist in believing, or in disbelieving, it consists in professing to believe what one does not believe. by empirepoker
30.11.04 @ 08:11:13
gambling - email - url
He who is unable to live in society, or who has no need because he is sufficient for himself, must be either a beast or a god. by online blackjack
30.11.04 @ 12:13:07
poker chips - email - url
Society is indeed a contract...it becomes a partnership not only between those who are living, but between those who are living, those who are dead, and those who are to be born. by world series of poker
30.11.04 @ 17:12:24
texas holdem poker - email - url
I wish to propose for the reader's favourable consideration a doctrine which may, I fear, appear wildly paradoxical and subversive. The doctrine in question is this: that it is undesirable to believe a proposition when there is no ground whatever for supposing it true. by partypoker
30.11.04 @ 17:16:21


Pity you asked this anonymously, I would have been interested in knowing if this inventive "spam-as-comments" is what your site is suffering from.
posted by ruelle at 12:18 AM on December 5, 2004


My God, that's evil. It also explains the sudden rise in referrer spam I've noticed in my logs.

The "custom HTTP header" is likely nothing more than a HEAD request, which is a standard HTTP request to ask the server when something was last modified. Blocking HEAD would be bad, most HEAD requests would be legitimate.

The only thing I can think of is some logic in an Apache module to detect repeated requests from the same host with referrers that aren't from your site. Unfortunately, that would be error-prone and clunky, and basically we'd end up in the same arms race we have with email spam.

I'm afraid that REFERER is just broken now - and these wankers (and people like them) broke it. So no, there is no answer to your first question. I am very interested in the second...
posted by i_am_joe's_spleen at 12:35 AM on December 5, 2004


PS: some helpful hints here, but no long term hope in my view.
posted by i_am_joe's_spleen at 12:39 AM on December 5, 2004


PPS: I do quite like the idea of the "checkback" described on that page. I might look into that...
posted by i_am_joe's_spleen at 12:43 AM on December 5, 2004


Perhaps someone could be kind enough to update your software so that a human-readable-manchine-unreadable image with a few letters must be entered to submit a comment?

I wish I could. :-D
posted by shepd at 12:49 AM on December 5, 2004


I still hate the spam I get in my blog (not so active), but it is great fun to find out the source of the copied text. Once I received an Emma Goldman quotation.

Along with a link to black jack poker...bleh.
posted by Gnatcho at 5:02 AM on December 5, 2004


Just to be clear: my understanding of the question is that this is not comment spam, but referrer log spam, so suggestions for getting rid of comment spam are not going to be helpful.

I know someone looking into this, and he led to this page. They are suggesting a blacklist-style approach, which doesn't sound very practical unless it's automated, but there is lots of info.
posted by frykitty at 7:18 AM on December 5, 2004


Just an FYI for some people who seem to be confusing referral spam with blog comments spam, they are two different beasts.

Referral spam consists of making a request to one's webserver and having the 'referal' (sic) value of the request be the spamming URL. This value is ostensibly used to keep track of how people get to your site--if I was on MeFi and clicked a link to anonymous.org/blog, then the referal URL logged on anonymous.org's site would be the URL of that MeFi thread (or the front page if I clicked from there).

Some blog engines keep lists of referrals on each page as a pseudo-trackback sorta thing; and also, some more savvy web hosts / individuals with servers list web stats online, which also will include the referral URLs.

So, the point of this referral spam is to get your URL on more and more webpages--either blog referral lists, or webserver stats pages--and thus increase your search engine rankings.

Blog comment spam is done for the same reason, but is accomplished by spamming comment pages on blogs (making actual comments), instead of just accessing the base blog URL with a modified REFERAL header in the request. Same objective, different method.

On preview: beaten by frykitty. But my explanation's longer! :D
posted by cyrusdogstar at 7:25 AM on December 5, 2004


How about setting a session cookie when the user visits your page to enter a comment and then check it exists on the submit? Not perfect but will mean their single HTTP request will fail. If you don't indicate either way then as long as they don't check the site of everyone they hit then you'll be okay.
posted by ralawrence at 11:40 AM on December 5, 2004


Well, TypePad uses this little bit of HTML at the bottom to track referrers:

<script type="text/javascript">
<!--
document.write('<img src="http://www.typepad.com/t/stats?blog_id=37618&amp;page=' + escape(location.href) + '&amp;referrer=' + escape(document.referrer) + '" width="1" height="1" alt="" />');
// -->
</script>

Sure, it's a webug, but I doubt the referrer spammers are a) checking for this or b) have JS egines running to execute it. You could probably code up the stats CGI in about ten minutes.
posted by sbutler at 1:41 PM on December 5, 2004


« Older Sleep in a Bra?   |   how to process poetry Newer »
This thread is closed to new comments.