I've stopped my web form spam. What am I missing?
April 3, 2008 1:23 PM   Subscribe

I seem to have stopped form spammers from cluttering a business inbox. Have I set myself up for some more mysterious problem down the road?

The website for my wife's dog training business has some "contact us" and "schedule an appointment" forms, which send email through the venerable cgiemail. Shortly after we added these sometime last fall, spammers began sending messages by POSTing directly to the emailer. I was able to stop these robots by changing the name of a required field on the form. After half a dozen name changes over six months or so, we finally got found by a robot smart enough to check the form first, began getting fifty or sixty junk messages a day.

The standard approach to this problem seems to be the addition of a "captcha." I decided against a captcha for three reasons:
  1. I'd have to change my backend to something that can validate a captcha. This probably isn't much of a hurdle.
  2. I'd have to trust the new backend. This takes a little thought, depending on how much of it I would have to write.
  3. Some potential customers might, like me, dislike squinting at captchas. We don't want to lose their business.
I think the third of these is a different goal than most people have. I don't want "no spam ever" or "fanciest web doohickey ever"; I want "no missed messages" and "minimum work." An occasional bit of junk doesn't pose a problem; the problem arises when there's enough junk to make my wife overlook the handful of legitimate messages. So before the final required field, I included the output of
echo "<!--"
head -c$((128*1024)) /dev/urandom | uuencode - | tr "<>" " "
echo "--&gt"
a few times. Now the HTML making up the form is about 700 kb long. This makes viewing the form once no more painful than viewing a graphics-heavy page, but my fifty-a-day form spammer has decided I'm not worth a 30 Mb download and gone away. Success! Now: how could someone break this?

Obviously, sending extra junk counts towards our bandwidth limit. But our limit right now is generous enough that getting close to the limit would take a distributed effort; I think we're a sufficiently low-value target that this is unlikely. The same logic makes me not too worried about loading the server --- the loop waits for the output to go out the network before making more. I'm not parsing any input. I can't think of any other server-side problems.

So, hive mind: what problems should I expect on the client side? Now that so many people have broadband, do I need to worry about sending a 700k HTML comment? Are there browsers that won't display the top of the form (where you type things) while the bottom loads? This trick does what I want very effectively --- what does it also do that I don't want?

(there's a link in my profile if you want to see the real deal)
posted by fantabulous timewaster to Computers & Internet (12 answers total) 2 users marked this as a favorite
 
Thought: If your randomish string of 700k contains the closing comment string "-->" then your form will be all kinds of screwed, yeah?
posted by mrnutty at 1:59 PM on April 3, 2008


Response by poster: mrnutty: I turn "<" and ">" into spaces.
posted by fantabulous timewaster at 2:05 PM on April 3, 2008


um; I've never used cgiemail before, so you'll have to pardon me if I've made an error in understanding exactly how it operates -- a cursory glance looks as though it is built to immediately process a POST request.

So, I, the spammer, notice that you've got a ton of bs in your form that I don't want to download. I take note of the required fields you have, and then I skip the whole form step, and just POST directly to your ACTION url. Effectively, the 700k thing is only a deterrent to unmodified spam-bots, but it's also a sorta pain in the ass for you and your clients.

You might be interested in a couple lightweight methods people have been using to defeat spambots:

1. obvious 'captcha'. Formfield titled "Type orange in the following field". Guess what? Spambots can't parse that sentence, but your users can. (This is used by Jeff Atwood of Coding Horror). You've got the same protection as above, without sending everyone a 700kb comment.

2. Hide a form-field using the "display:none" css property. If someone puts something in the form field (spambots often stick shit in every field they can), toss it out -- it's a bot. This one has the added benefit of not requiring any extra work from your clients.

Now, both of these are trivial to defeat if someone bothers to take a look at your form, but it's reported that often even the most trivial roadblock is enough to discourage a form spammer, because why not move on to the next guy who has a wide open form.
posted by fishfucker at 2:17 PM on April 3, 2008


I had no idea spammers had gotten so obnoxious (or desperate) The robots probably think that the form might allow for actual posting on the web.

If I were in your situation, I'd try using javascript to change the value of a hidden form element, and check to make sure that form element is set properly before handling the result, since I doubt most spammers would want to risk running javascript.

Sure, you'll miss people who don't have javascript enabled, but that might be fewer people then those on modems.
posted by delmoi at 2:23 PM on April 3, 2008


Response by poster: fishfucker: Changing the list of required fields typically buys me a few weeks, and takes maybe fifteen minutes a month. Adding a test would be trivial if I were parsing the form myself already, but I'm using a black box to put it in a readable email.
posted by fantabulous timewaster at 2:53 PM on April 3, 2008


ah, i see. you're unable/don't want to change what it posts to. (aaaaand, after downloading the source, I can see why. Not very straightforward).

hmm. well, after looking at it a little more, i suspect it is working like this:

1. user gets form.
2. user submits form.
3. cgiemail loads email template, parses it, uses it to process POST.
4. if any required fields are missing, rejects email.

looks like in this case delmoi's javascript method is a good bet -- don't stick in a required field (mentioned in the template) until after the page loads.

How they could break it is: load up your form, copy/paste it to their own server without the extra junk, point spambot at it. Would they bother? Doubtful.
posted by fishfucker at 3:37 PM on April 3, 2008


Was it just you (i.e, the intended target of the form) receiving the spam or was the spammer using your form to send to other addresses? You may need to look at mail server logs to know the answer. If they were successfully sending to others then your form is probably susceptible to header injection.

Header injection occurs when someone submits their own form directly to your action url and manages to send in a multi-line header where you where expecting one line. Typically this can happen if your form sends a "From" address or subject. For "From" they'll submit something like [fake address][line break]To: another@address.com. Depending on the mail server and how the form works, that may well cause it to send to the other address which could be a big comma delimited list of addresses.

I think things like cgi email scripts had a bad reputation for this sort of thing. Hopefully cgiemail has been fixed so make sure you're using the latest version.
posted by tetranz at 4:44 PM on April 3, 2008


I was thinking about this kind of problem the other day -- what if you just change the address of your form mail script regularly?

For instance, what if the script is called sendemail-<random string>.php today, and sendemail-<different random string>.php tomorrow?

A cron job could run at midnight and make the change, then update any pages which refer to it.

It wouldn't stop a bot which first parsed your page for the new URL, of course, but it would definitely stop one which just returned to the same URL.
posted by AmbroseChapel at 4:57 PM on April 3, 2008


Response by poster: tetranz: I don't let form input change the mail headers. Anxiety about becoming a spam robot myself is part of my reason for not just whipping up my own mailer.

AmbroseChapel: That's basically what I'm doing. I was trying to think about the problem economically:
  1. Form spammers are businesspeople trying to sell something --- mostly, it seems, ad impressions at garbage URLs.
  2. The cost of sending me a message is proportional somehow to its size: bandwidth spent trying to get a message to me can't be used to send a message to anyone else.
  3. If my form changes --- its action changes, or its required fields change, or whatever --- the spammer has to get the form before posting their message. This approximately doubles the cost of sending me a message, and made the level bearable for half a year or so.
  4. If the form is big, the cost goes way up. Spammers who might automatically load and process a short form will, at some point, give up on a long one.
What moved me to post was the astonishing effectiveness of #4: we just got our first automated message in three or four weeks. I figure that if it works that well, and I've never heard of anyone else doing it, there must be some catch I haven't thought of.
posted by fantabulous timewaster at 10:58 PM on April 3, 2008


Best answer: well, the catch is
1. it doesn't scale; no developer would recommend this tatic because you're essentially throwing away bandwidth. In large aggregates, bandwidth costs very real money. It's not completely out of the question that someone wouldn't request this page repeatedly to be a dick.

2. There's a high user cost. The user has to wait for their browser to download and process the comments. While captchas and the like are annoying, they are a largely accepted part of forms these days; waiting for a form to render? not so much.

3. It doesn't stop anyone from posting to your form processor, it just makes it difficult for them to figure out what the required fields are. The easiest way to do this form security by obscurity trick is the half-assed captcha or blank form field that i mentioned above. These are lightweight and work using the same principles that your tactic is using -- it makes it difficult for a spambot to get the correct fields to submit for your form. Most 'modern' forms use some sort of session token to 'prove' that a user rendered the form, so this entire issue is moot for them anyways.

Now, that's not to say that this solution (although incredibly clever) does not work for you, where you're dealing with a website that -- I'm guessing -- has a predictably small number of visitors. It's just it's possible to use a similar security-by-obscurity tactic without passing on the cost to the user.
posted by fishfucker at 9:15 AM on April 4, 2008


and btw, i hope you do not find that a harsh judgement, because it's not my intent. I love seeing outside of the box solutions like this. thanks for sharing with us.
posted by fishfucker at 9:23 AM on April 4, 2008


Response by poster: Thanks for the advice.

You're quite right that this wouldn't work for a situation where additional outgoing bandwidth is the limiting expense.

I'm not sure I agree with you about user cost. To me, "why is this slow? oh, it's encouraging me to type while it loads" is less irritating than "hmmm, I have to solve this puzzle." I haven't looked over many shoulders while people have used this, though.

I've been wondering whether this trick would work over SMTP: if the sending host is unknown, the response to HELO is long. Don't have a busy mail server where I can experiment, though.
posted by fantabulous timewaster at 6:22 PM on April 12, 2008


« Older Keys! Come here Keys! Common! Good Keys! Your...   |   How to setup canadian sms sending business? Newer »
This thread is closed to new comments.