Spam filtering
April 18, 2006 8:36 AM
Lately I've been getting more and more spams where the content is delivered as an image. There's no keywords to filter; even the image filename and subject are random words. Are such messages unfilterable?
Currently I am using PocoMail, but its filters are not very complex and it's not giving me the ability to trash messages with zero text (which are always spam).
So basically what I'm looking for are what my options are for filtering this kind of crap (and if possible the spams with jumbled up spellings of Ambien / Viagra, etc). These have all been problem areas.
Should I switch to Thunderbird or some other program? Or is server-side (I'm on Dreamhost) or blacklist-based filtering worthwhile?
So tell me all about how you've managed to automate trashing these slippery forms of spam. I am open to any other robust mail program with good anti-spam capability. NO OUTLOOK SOLUTIONS PLEASE; I won't use Outlook. I've filtered spam mostly by hand and PocoMail filters, and have removed most of my addresses from harvesting, which has worked great, but the spam is starting to get bad again.
Currently I am using PocoMail, but its filters are not very complex and it's not giving me the ability to trash messages with zero text (which are always spam).
So basically what I'm looking for are what my options are for filtering this kind of crap (and if possible the spams with jumbled up spellings of Ambien / Viagra, etc). These have all been problem areas.
Should I switch to Thunderbird or some other program? Or is server-side (I'm on Dreamhost) or blacklist-based filtering worthwhile?
So tell me all about how you've managed to automate trashing these slippery forms of spam. I am open to any other robust mail program with good anti-spam capability. NO OUTLOOK SOLUTIONS PLEASE; I won't use Outlook. I've filtered spam mostly by hand and PocoMail filters, and have removed most of my addresses from harvesting, which has worked great, but the spam is starting to get bad again.
SpamAssasin combines multiple layers of rules to make judgements about spam vs not-spam. A few of the rules deal with the ratio of images to HTML. Looking through my sorted spam I see number of image-only or mostly-image spams that its caught.
The key thing is that it uses other criteria (forged headers, reverse DNS checks, etc) before it reaches a final judgement, otherwise it would probably bin photos of my nephews as spam.
I don't generally trust blacklists on their own. I've tried using them before and found that most of my incoming mail got tagged, but Spam Assasin only uses black list matches in combination with lots of other criteria, and I haven't seen a false positive as a result.
So, you might give spam assasin a try. The only issue is that it's perl and looks like it can use a fair amount of resources.
posted by Good Brain at 9:00 AM on April 18, 2006
The key thing is that it uses other criteria (forged headers, reverse DNS checks, etc) before it reaches a final judgement, otherwise it would probably bin photos of my nephews as spam.
I don't generally trust blacklists on their own. I've tried using them before and found that most of my incoming mail got tagged, but Spam Assasin only uses black list matches in combination with lots of other criteria, and I haven't seen a false positive as a result.
So, you might give spam assasin a try. The only issue is that it's perl and looks like it can use a fair amount of resources.
posted by Good Brain at 9:00 AM on April 18, 2006
If Spamassasin's resource utilization creates problems with dreamhost's resource limits, it looks like you can run it on your desktop against an IMAP mailbox.
http://wiki.apache.org/spamassassin/RemoteImapFolder
posted by Good Brain at 9:04 AM on April 18, 2006
http://wiki.apache.org/spamassassin/RemoteImapFolder
posted by Good Brain at 9:04 AM on April 18, 2006
Hrmm... I quite liked popfile when I was using pop based email... the only thing is, it requires you to run it locally on your machine. If you don't have a problem with that, it seems to do quite well (I was at something like 99% effectiveness before I switched over to entirely-Gmail-based, and Gmail isn't anywhere NEAR as good at spam filtering)
posted by antifuse at 9:16 AM on April 18, 2006
posted by antifuse at 9:16 AM on April 18, 2006
Seconding SpamAssassin as catching these messages every time (with Bayesian filtering turned on).
posted by deadfather at 9:30 AM on April 18, 2006
posted by deadfather at 9:30 AM on April 18, 2006
Gmail's spam filter catches these image spams (even when they have a lot of random text included).
posted by rooftop secrets at 10:19 AM on April 18, 2006
posted by rooftop secrets at 10:19 AM on April 18, 2006
SpamAssassin with the following:
* bayes
* custom rulesets
* realtime RBL tagging
I NEVER block mail with crap like SPEWS or SORBS. Letting SpamAssassin merely tag the mail and give it a score based on those lists is sufficient.
You can install a user-level version of SpamAssassin to handle your mail. It's not too bad. If you get a reasonable amount of email and Dreamhost complains about it sucking up resources, then they just suck.
This is a common problem with shared hosting like that - pitiful antispam services.
(Disclaimer: I own a web hosting provider, and don't do the "everything on one box" model for that very reason.)
posted by drstein at 10:54 AM on April 18, 2006
* bayes
* custom rulesets
* realtime RBL tagging
I NEVER block mail with crap like SPEWS or SORBS. Letting SpamAssassin merely tag the mail and give it a score based on those lists is sufficient.
You can install a user-level version of SpamAssassin to handle your mail. It's not too bad. If you get a reasonable amount of email and Dreamhost complains about it sucking up resources, then they just suck.
This is a common problem with shared hosting like that - pitiful antispam services.
(Disclaimer: I own a web hosting provider, and don't do the "everything on one box" model for that very reason.)
posted by drstein at 10:54 AM on April 18, 2006
I used to work Cloudmark who does in fact make an Outlook spam filtering plugin like you don't want. So I'll just answer your more general question: "Are these filterable?" without suggesting a specific piece of software as a solution.
First thing first: There's lots of variety in the world of spam. Some (but not all) image spam mutates the image somewhat each time. Maybe there's a border of random noise pixels, or noise sprinkled around, or gross elements move each time, or whatever.
These are tricky, but they are currently filterable, just as most captchas are breakable. Since most filters don't do very sophisticated image processing, if any, most spammers don't do very sophisticated image mutations. So they're going to get a lot trickier as filters adapt and then spammers re-adapt.
posted by aubilenon at 11:04 AM on April 18, 2006
First thing first: There's lots of variety in the world of spam. Some (but not all) image spam mutates the image somewhat each time. Maybe there's a border of random noise pixels, or noise sprinkled around, or gross elements move each time, or whatever.
These are tricky, but they are currently filterable, just as most captchas are breakable. Since most filters don't do very sophisticated image processing, if any, most spammers don't do very sophisticated image mutations. So they're going to get a lot trickier as filters adapt and then spammers re-adapt.
posted by aubilenon at 11:04 AM on April 18, 2006
Dreamhost does run spamassassin, which you can activate through your account panel. However (as I've learned to my dismay) they do not support individualized user prefs. They do let you run it under your own account (if you want to customize it), though. See this.
posted by adamrice at 11:07 AM on April 18, 2006
posted by adamrice at 11:07 AM on April 18, 2006
Thunderbird doesn't seem to catch this type of spam. I mark every single piece of spam I get, hoping to get it trained properly, but it still only identifies less than half of the spam I receive.
posted by clarissajoy at 11:38 AM on April 18, 2006
posted by clarissajoy at 11:38 AM on April 18, 2006
For PCs, the program K9 is better than Popfile. K9 also runs locally, but it runs compiled code (Popfile is interpreted) and K9 has a much nicer user interface.
It stops the kinds of spams you're talking about because the <img tag gets a high "spam" score.
I've been using K9 for a couple of years and I'm sold on it. (Though since I moved it hasn't gotten much work, because Comcast seems to have a superb spam blocker of its own.)
posted by Steven C. Den Beste at 11:41 AM on April 18, 2006
It stops the kinds of spams you're talking about because the <img tag gets a high "spam" score.
I've been using K9 for a couple of years and I'm sold on it. (Though since I moved it hasn't gotten much work, because Comcast seems to have a superb spam blocker of its own.)
posted by Steven C. Den Beste at 11:41 AM on April 18, 2006
Thunderbird doesn't seem to catch this type of spam. I mark every single piece of spam I get, hoping to get it trained properly, but it still only identifies less than half of the spam I receive.
I have the same problem with Thunderbird, it's really annoying. What's the point in training the junk filter if it doesn't realize that after the 10th time, something with the subject "Viagra" is probably junk?
posted by RoseovSharon at 2:01 PM on April 18, 2006
I have the same problem with Thunderbird, it's really annoying. What's the point in training the junk filter if it doesn't realize that after the 10th time, something with the subject "Viagra" is probably junk?
posted by RoseovSharon at 2:01 PM on April 18, 2006
I use spambayes for my e-mail filtering (in addition to greylisting and SPF record checking). I took a look in my spam folder, and it seems to have caught many image spams. It recognizes these based on clues that are distinctive to these kinds of messages: content-type:text/jpeg, the HTML fragment src="cid:, etc.
Other, legitimate messages have these characteristics too (though as a geek with geek friends, I don't get a lot of HTML e-mail) but by having only an image and no text, there's not a chance for clues that indicate legitimate e-mail.
I'm very impressed with spambayes, though greylisting also made a big improvement in my spam situation. Once in a blue moon, I a spam gets through. Once in a blue moon I have to go to my spam folder to find a legitimate message. Each day there are from one to a dozen messages that the software is "unsure" about, and they're 95% spam. Spambayes is effective against misspelling spam, spam with "word salad", image spam, and I've also trained it to mark viruses as spam.
I use spambayes directly on my linux mail server, but on windows it can integrate directly into Outlook (not express), and there are also versions which serve as POP or IMAP proxies with a web interface for configuration.
posted by jepler at 3:00 PM on April 18, 2006
Other, legitimate messages have these characteristics too (though as a geek with geek friends, I don't get a lot of HTML e-mail) but by having only an image and no text, there's not a chance for clues that indicate legitimate e-mail.
I'm very impressed with spambayes, though greylisting also made a big improvement in my spam situation. Once in a blue moon, I a spam gets through. Once in a blue moon I have to go to my spam folder to find a legitimate message. Each day there are from one to a dozen messages that the software is "unsure" about, and they're 95% spam. Spambayes is effective against misspelling spam, spam with "word salad", image spam, and I've also trained it to mark viruses as spam.
I use spambayes directly on my linux mail server, but on windows it can integrate directly into Outlook (not express), and there are also versions which serve as POP or IMAP proxies with a web interface for configuration.
posted by jepler at 3:00 PM on April 18, 2006
This thread is closed to new comments.
posted by keijo at 8:53 AM on April 18, 2006