How to prevent crawlers from harvesting your email on a website?
September 5, 2009 9:10 AM   Subscribe

How to prevent crawlers from harvesting your email on a website?

This question was asked before in '06. I'd like an '09 update.
posted by paulinsanjuan to Technology (15 answers total) 4 users marked this as a favorite
 
Probably the simplest method that's robust is to use reCAPTCHA Mailhide

If a client isn't particularly bothered about email harvesting, I just HTML-encode all the @ symbols to give a minimal level of protection.
Next step up from that is usually to replace the @ with an image and make the mailto: link use JavaScript instead.
If email harvesting is a big concern, I usually recommend having a contact form instead.
posted by malevolent at 9:56 AM on September 5, 2009


I usually just write it as "username at example dot com"
posted by EndsOfInvention at 10:10 AM on September 5, 2009


If you use Wordpress there's a plugin that will convert all email addresses to javascript, making them largely invisible to crawlers. I'm sure there's similar code you can use on any site.
posted by Aanidaani at 10:15 AM on September 5, 2009


I usually just write it as "username at example dot com"

I really wouldn't be surprised if harvesters could work this sort of thing out by now. It's fairly trivial to search for 'at (something) dot com' as well as '@' or '.com', then reconstruct the address.

I use a javascript obfuscator along with an email address that I can happily ditch if it ever starts to attract spam.
posted by le morte de bea arthur at 10:18 AM on September 5, 2009


I just tried out that reCAPTCHA link malevolent sent. It's working out quite well and I think I'll probably replace my Wordpress plugin with it. You can see a live example on my site here: http://dcg.materials.drexel.edu/?page_id=12
posted by Aanidaani at 10:38 AM on September 5, 2009


just make a gif image of the email address like facebook does?
posted by yoyoceramic at 10:53 AM on September 5, 2009


On my business website, I saw pretty clear evidence back around 2004 that the text obfuscators are useless. I started having spams come in anyway.

So I switched to a PHP form on my website where people can send a message to me. Spammers don't use this because of course they're SMTP-based and have no way of dealing with little custom forms like this. The e-mail address is hidden in the PHP code. When I write back, they get a regular e-mail and no longer have to go through the form.

But it's kind of a losing battle as I've seen very clear evidence that a lot of harvested addresses come from zombie computers, where Outlook Express's address books are compromised. So if Uncle Fred is infected due to all that porn he looks at or all those shady web offers he visits, AND he happens to send e-mails to you regularly, then you're gonna get spam anyway. Nothing that can be done about that. For that reason I send all e-mails to a gmail account to get cleansed, which forwards back to a secret e-mail address used only to receive incoming messages. That gets rid of most of the spam. I've been nothing but pleased with this setup.
posted by crapmatic at 10:54 AM on September 5, 2009


PHP and other forms are really a terrible route, IMO. I hate filling them out rather than using my normal email, I hate not having a copy of the email in my sent-mailbox, and I hate not knowing if it really was sent or is just some corporate self-serve memory hole.
posted by Rumple at 11:36 AM on September 5, 2009


A trick I use is to replace each of the periods with a • (&​bull;) in a super-small font size so that it looks like a period.
posted by XMLicious at 11:42 AM on September 5, 2009 [1 favorite]


I use the Hivelogic enkoder.

I use to go through the whole 'use different addresses for different situations' method. In other words, use metafilter@example.org here and amazon@example.org elsewhere, but it's sort of a pain.

I have no idea which is doing the job or if I'm just lucky, but between the enkoder and regular spam guards I don't ever get spam. Ever.
posted by justgary at 12:50 PM on September 5, 2009 [1 favorite]


I also use the Hivelogic enkoder. It works like a charm.
posted by The Michael The at 1:15 PM on September 5, 2009 [1 favorite]


I usually just use a quick javascript function that prints the email address, broken up in pieces. 1 email address, 5 or so various variables containing a piece of the email address, and print them together. In the web page I just call the function.
posted by cgg at 1:39 PM on September 5, 2009


Just to toss out an alternate non-solution: don't bother trying to prevent it.

Anything that successfully hides your email address from a bot carries some usability cost. Maybe people have to re-type your email address, struggle with a captcha, or not see your address at all (ex: disabled user with javascript turned off), etc.

Your email address is likely to get out regardless, so you'll need spam detection anyway.
posted by samsm at 1:51 PM on September 5, 2009


I'm not sure it's just luck, but I've been using the email protector link from Iconico with a pretty good success rate.
posted by spacelux at 8:44 PM on September 5, 2009


Response by poster: I dont know which one to mark as best answer but thanks for all the recommendations. I'll try them out and post the results back here.
posted by paulinsanjuan at 10:32 PM on September 5, 2009


« Older Can I bleach my curtains white?   |   Sorry, I can't come into work today, it's... Newer »
This thread is closed to new comments.