Challenge based spam filtering
December 8, 2005 9:27 AM Subscribe

Can anyone recommend a challenge-based spam filtering system with good usability? mailblocks.com recently shut down.

I've got a brand new domain name and I'm interested in being more aggressive about spam filtering. procmail and its Bayesian filter aren't good enough. I was all set to go with mailblocks; the first time you mail someone at mailblocks you're challenged to prove you're a human, then subsequent responses go through. A sort of automated whitelist. A friend loved it, but alas they've terminated services.

I run my own Debian server with Postfix and procmail. I'm happy to host the software myself. What I care about is automation and ease of use, both for me and the poor bastards who try to email me. A quick search turned up ASK and TDMA. Any experiences with those? Any alternatives? Something that can install easily on Debian is preferred.

posted by Nelson to Computers & Internet (16 answers total)

I used ASK a few years ago and loved it. People who had to respond to it loved it (one time response). Zealots I talked to on the internet (why oh why do I bother) would tell me I'm a rude/obnoxious/insane person. I figure it made a good idiot filter. :^D
posted by shepd at 9:39 AM on December 8, 2005

among the "zealots" who don't appreciate challenge-response systems are those whose addresses appear in places that make it likely for the many outlook-borne viruses to forge mail coming from them.

for example, license@php.net gets inundated with crap because it appears in the phpinfo() results that often happen to be in user's web caches, which some of those lovely viruses plunder for addresses to spread themselves to.

i have email addresses that have been active for at least a decade (and thus deeply embedded in the lists that spammers trade), and i find that using a few DNS blackhole lists knocks out the vast majority of the spam i receive. spamassassin picks up a few more, and some other qpsmtpd plugins catch a few more. i get little enough spam that i don't see the need to contribute to other people's problems by using a challenge-response system.
posted by jimw at 10:06 AM on December 8, 2005

I don't like challenge authentication because it has to be managed for mailing lists, one-time emails from online stores, stuff like that. I run Debian with Postfix also, and I solved my problem by using Spamassassin and greylisting. Set up Postgrey alone and you should see a huge reduction in spam; I did. I also found that Spamassassin alone wasn't enough; you need to use some of the custom rule sets that people make available, in order for it to be really effective. SARE has lots of these, as well as a nice script you can run weekly in cron to keep the rule sets up to date. I use these rule sets:

70_sare_bayes_poison_nxm
70_sare_genlsubj
70_sare_header
70_sare_html0
70_sare_html1
70_sare_random
70_sare_specific
72_sare_redirect_post3.0.0
99_FVGT_Tripwire
antidrug
backhair
chickenpox
evilnumbers
tripwire
weeds

With this, my spam system is totally automated and I can ignore it and let it do its thing. People emailing me don't even know what's happening.
posted by autojack at 10:30 AM on December 8, 2005

I use SpamArrest, which works well - I've had my account for several years.
posted by mark7570 at 10:39 AM on December 8, 2005

Please for the love of god don't set up another challenge-response system. These need to die, because they are fundamentally broken. Those of us that have had our email address spoofed by a spammer or the email virus de-jour and get our inboxes flooded by your oh-so-helpful "confirm this spam" bullshit will thank you.

Please just think about this: Virtually every single piece of spam as well as nearly any malware that uses email to spread spoofs the from addresses. If you are sending out one of these moronic challenge emails in response to every piece of spam you are just multiplying the spam problem, and making it the problem of some innocent unrelated third party. THE FROM ADDRESS IS NOT WHO SENT THE SPAM, DO NOT SEND ANY CRAP TO HIM.

SpamAssassin with some additional SARE rules will cut down your spam very effectively, and will be much more responsible and considerate than this "spam everybody else in response" crap.

And finally, if I legitimately tried to contact someone and was greeted with one of these moronic autoreplies then I would delete it and consider that person uncontactable, because I'm certainly not going to jump through YOUR hoops to filter your spam. I have my own spam to filter, thanks.

I am not alone in this belief.
posted by Rhomboid at 10:58 AM on December 8, 2005

I'm sorry, that came off sounding very bitter. I know my reply was not very helpful or the kind of thing you wanted to hear. But I think a lot of people don't realize the damage that these CR systems cause.
posted by Rhomboid at 11:05 AM on December 8, 2005

autojack: Please take a look at recent spamassassin-users mailing list postings. I believe that the author of "antidrug" has requested that people stop using it, and the same with 'backhair' and 'weeds' as they're quite outdated now.

I'd replace all of them with RulesDuJour and the URIBL black/greylists. Some of the lists that you have listed have been incorporated into the DNS based blacklists now.

I also agree with other posters - the challenge-response systems really are annoying. I've had my domain and the same email address since 1999 and I've cut down the flow of spam quite a bit by just using SpamAssassin and the add-on rulesets.
posted by drstein at 11:35 AM on December 8, 2005

I thought about including in my question "and is challenge/response a good idea?" but left it out. So answers saying "for the love of god don't do do a challenge system, here's why" are helpful, thanks!

What I really need is something better than spamassasin + razor + pyzor + spamassassin bayes. It's a brand new domain, I'm hoping it never gets polluted.
posted by Nelson at 12:01 PM on December 8, 2005

Well, I was going to bite my tongue, but since you said it's helpful:

For the love of god don't do a challenge/response system. They annoy me to no end, and when I get a "challenge" message from someone I delete it and consider them unreachable. I also get lots of spam/virus challenge messages, which are just as bad as real spam.

Also, it's just not a reliable system - half of the challenge messages you send out are going to get lost in someone else's spam filter, so people will just assume you're not answering your mail. And the misdirected challenge messages might get your IP reported as a spammer, too.

Finally, it bogs down your mail server - when you get actual spam, a few of the challenge messages will go to legitimate addresses of people like me who didn't send it, but the vast majority will be nonexistent addresses, and your server will spend its days trying to send thousands of challenge messages to addresses that bounce.

Okay, I'm done. Now some practical tips for your new domain:

The only thing I can think of better than spamassassin + razor + pyzor + bayes is spamassassin + razor + pyzor + bayes + dcc + RulesDuJour, which takes care of 99.9% of my spam including some 15-year-old addresses that are on every possible list.

For a brand new domain, you can be pretty safe if you do a few things:

1. Don't publish email addresses (specifically, don't include them in web pages).

2. When you sign up for anything online, use a special address - either a junk@mydomain address that you change regularly, or a custom address for each one. I have my domain set up so that myname-anything@mydomain goes to me, and I would sign up here with myname-metafilter@mydomain for example. That way, if anyone does start spamming, you can block a specific address.

3. Avoid using any easily guessed addresses at your new domain. john@mydomain or smith@mydomain are going to get spammed; myprettypony@mydomain won't.

4. Configure your mail server to reject mail to nonexistent addresses at the SMTP level. You will get spam sent to random addresses at the new domain; this eliminates it without bogging your server down sending bounce messages to nowhere.

Good luck!
posted by mmoncur at 12:41 PM on December 8, 2005

Oh yes, definitely enable pyzor, razor, and dcc with SpamAssassin. They are not enabled by default because they require you install the respective clients and configure them. But they greatly improve the overall accuracy.

I use this combination (SpamAssassin with bayes, pyzor, razor, dcc, and bayes autolearning) and it works like a dream.

The only downside to using these external callouts is it can cause the scan time to march higher and higher. I don't know why SA doesn't launch the razor/pyzor/dcc checks in parallel at the beginning of the scan and then collect the results after it finishes its own processing. Maybe that's in the works for the future...
posted by Rhomboid at 12:59 PM on December 8, 2005

Rhomboid, that is why a good system does the following (I haven't run ASK in a while to check if it does all this, when I used it these tactics weren't as popular):

a) The return address should be on the same network as the sender machine (although there are exceptions, this should be a case where the address should be added manually). This would require address spoofs to be from the same company the spam was sent from. If a customer gets angry because their ISP likes to host spammers, I shed no tears.

b) Checks the server's SPF records to see if (a) is at all reasonably true.

The usual response I get to (a) is that they won't tell me their address before they send me mail, and since they absolutely *have* to email you using an SMTP client in an internet cafe, and there's no way they could use webmail from their ISP, the return and sender won't match up at all. In which case my filter is working great (I'm not interested in talking with people too lazy to tell me THEIR email address if I've already told them mine).

And the usual response to (b) is either that they admin a server and aren't competent to install it, which, again, means I don't care. Or it is that they are on an ISP that doesn't care enough to bother with it. In which case, try using a less broken ISP!

I simply haven't run any challenge response system in a long time because, at the moment, I'm only getting about 10 spams a day, so I don't care too much. If it goes over that, then I'll run it again.

The best combo, which is something I'd like to implement (someday, ha!) is much more simple: The email either has to come from a known party (as in C/R systems), or it needs a specific key text in it (which, when found, adds the address to the whitelist automatically).

If you want to publish your email address on the web, that's great, remind people mailing you they need to include the word "kwyjibo" in their email the first time.

Now, if a mail comes across outside of the whitelist or without the keyword, run it through spamassassin or some other spam checker, and, if it passes, send a Challenge / Response questionaire to the party.

Otherwise, dump the mail.

That method would stop the anti-C/R people from complaining about mailboxes full of C/R mails (which I doubt happens on a frequent basis, I run a small mailserver [yes, too lazy for SPF, one day I will bother] and I've never even seen a valid C/R mail, never mind an invalid one) from forged address spam originating from their network (as required by (a)), arriving from non-SPF validated machines (as would have helped in (b)) because if the spam software they always reccomend is so great (IMHO, I've seen about 10% inaccuracy rates with it, THERE'5 T00 MANY IDI0T CIO'S THAT T A L K LIEK TIHS 4 TEH INTARNET GEEK SPEEK LOOKS) they won't get the C/R mail even in the rare case they might have without the spamchecking.

But, of course, the problem is that C/R systems are given such an automatic bad rap that even mentioning them for use in cases like this is punishable by internet death penalty. :-D

The fun thing is, I can't think of any incoming message to me that hasn't been asking a question of me, or that didn't want to make some sort of salient point that would necessitate a reply from me. If you want to ask me questions, you'll need to do it on my grounds (ie: respond to the C/R email) or you are free not to bother (why would I care if I have more spare time?!) If you want a reply from me, then you can pay me the courtesy of replying to a C/R mail. If not, then I doubt I have the time to reply to your mail either. It's a bit like how I tell rude guests that won't leave their shoes at the door they must leave.
posted by shepd at 1:30 PM on December 8, 2005

Challenge-Response Anti-Spam Systems Considered Harmful
I recommend SpamGourmet as a solution to spam
posted by Sharcho at 3:23 PM on December 8, 2005

If you're going to first score the email w/SpamAssassin then why bother sending out a C/R to ones that pass? On my system SpamAssassin already does a satisfactory job of weeding out all spam, to the point where what's left is virtually all ham. If I then sent C/R emails to those I would just be inconveniencing people that I know are not spammers.

That's kind of my whole objection to C/R in a nutshell - if you send them to everything, you're obviously going to be spewing a lot of worthless challenges to spoofed addresses. If you try to weed out the spam first then you're just sending useless busywork to address that you already suspect to be human, which completely defeats the whole purpose.

Plus none of the above checks will do anything about those tards that sign themselves up to a mailing list and neglect to whitelist it. Nothing says inconsiderate bastard like posting to a mailing list and then receiving a handful of these stupid "you tried to email me but you're not on my list" replies. That is just broken on so many levels. And of course since they're sent to the header-from and not the envelope-from they go directly to the poster and not the mailing list admin, who will remain unaware of said tard. Plus a lot of these C/R systems will only tell you the name -- and not the email address -- of who it is you're supposed to be verifying so that the admin can't even remove the tard from the mailing list. That is doubly broken. C/R systems are the bane of any mailing list admin. Yeah yeah, sure, YOU don't do this, but I've had this happen to me countless times.
posted by Rhomboid at 3:34 PM on December 8, 2005

Nelson: The mail on my server is sent through two spam filters. If they agree that a message is spam, I don't see it (this is so good I almost never check for it, and when I do, I sort for keywords to narrow down the potential false positives). If they disagree, it gets put in a separate folder that I skim regularly. If it is not spam by either of them, it lands in my mailbox.

The two systems here are DSpam (trainable) and SpamAssassin (score-based threshold, it's set on 3 or something low like that).

I, too, hate challenge-response, since you said that was helpful too.
posted by whatzit at 6:13 PM on December 8, 2005

I like DSpam as well. I use only it, and it works great -- I see about one spam a month, and maybe one false positive every two.

Read this awesome article on super-hard-core spam filtering by a guy who gets millions of spams per day. His solution seems to mainly hinge upon shutting off non-SMTP-conformant delivery attempts, which is quite smart considering how he wants to also keep the spammers from using too many of his CPU cycles. His last line of defense appears to be Bogofilter, QSF, and BMF.

Challenge-response I think would be more hassle than it's worth even for you. One gets a lot of automated mail from one's banks, online shopping, social sites. Hell, even Metafilter signups require getting an automated email. And if you join a mailing list, everyone will be pissed at you. If you join a mailing list where it forges the 'from' header to be the list's address, may God have mercy on your soul.
posted by breath at 11:59 PM on December 8, 2005

Well, Rhomboid, I'd send out the C/R email because, from my personal experience when I *had* a spammy account, spamassassin wasn't accurate. Just like the 10% of real emails that were marked spam, I'd probably say 10% spam go through as well marked as real emails.

It just wasn't good enough to make me happy about my mail.

Now, since I've never been one to care TOO MUCH about idiots who send real email so poorly it gets marked up as spam, those being trashed are ok. But if a clueful person manages to get my email address from a friend, but not the key (highly highly likely) I'd like to give their email a chance. But, because of incomplete information, I can't be sure, hence the rare (but, IMHO, necessary) hoop.

Yes, I could use a trainable system instead of spamassassin. But I refuse to let spam take up any more time than it takes me to install software (or make new software for fun). I simply refuse, in the same way anti-C/R people refuse, to sit there training a shit-filter.
posted by shepd at 6:53 AM on December 9, 2005

« Older External Hard Drive Data Loss! | Why would journalists embrace/eschew blogging? Newer »

This thread is closed to new comments.

Ask MetaFilter

Challenge based spam filtering
December 8, 2005 9:27 AM Subscribe

Tags

Share

Challenge based spam filtering December 8, 2005 9:27 AM Subscribe

Tags

Share

Challenge based spam filtering
December 8, 2005 9:27 AM Subscribe