I seem to have stopped form spammers from cluttering a business inbox. Have I set myself up for some more mysterious problem down the road?
The website for my wife's dog training business has some "contact us" and "schedule an appointment" forms, which send email through the venerable
cgiemail. Shortly after we added these sometime last fall, spammers began sending messages by POSTing directly to the emailer. I was able to stop these robots by changing the name of a required field on the form. After half a dozen name changes over six months or so, we finally got found by a robot smart enough to check the form first, began getting fifty or sixty junk messages a day.
The standard approach to this problem seems to be the addition of a "captcha." I decided against a captcha for three reasons:
- I'd have to change my backend to something that can validate a captcha. This probably isn't much of a hurdle.
- I'd have to trust the new backend. This takes a little thought, depending on how much of it I would have to write.
- Some potential customers might, like me, dislike squinting at captchas. We don't want to lose their business.
I think the third of these is a different goal than most people have. I don't want "no spam ever" or "fanciest web doohickey ever"; I want "no missed messages" and "minimum work." An
occasional bit of junk doesn't pose a problem; the problem arises when there's enough junk to make my wife overlook the handful of legitimate messages. So before the final required field, I included the output of
echo "<!--"
head -c$((128*1024)) /dev/urandom | uuencode - | tr "<>" " "
echo "-->"
a few times. Now the HTML making up the form is about 700 kb long. This makes viewing the form
once no more painful than viewing a graphics-heavy page, but my fifty-a-day form spammer has decided I'm not worth a 30 Mb download and gone away. Success! Now: how could someone break this?
Obviously, sending extra junk counts towards our bandwidth limit. But our limit right now is generous enough that getting close to the limit would take a distributed effort; I think we're a sufficiently low-value target that this is unlikely. The same logic makes me not too worried about loading the server --- the loop waits for the output to go out the network before making more. I'm not parsing any input. I can't think of any other server-side problems.
So, hive mind: what problems should I expect on the client side? Now that so many people have broadband, do I need to worry about sending a 700k HTML comment? Are there browsers that won't display the top of the form (where you type things) while the bottom loads? This trick does what I want very effectively --- what does it also do that I don't want?
(there's a link in my profile if you want to see the real deal)>
posted by mrnutty at 1:59 PM on April 3, 2008