Grinchy Spam Filters
December 6, 2008 6:38 AM   Subscribe

Christmas Email Conundrum: How exactly do Spam Filters work?

We have produced a short Christmas video for our Christmas Card this year. We're putting it on youtube and sending a link to all of our friends and family. Right now I have 97 email addresses that the link needs to go to. What I would like to do is put all 97 addresses in the BCC field and only my wife's address in the To: field.

My question: Is the email likely to be marked as Spam by several/most/any of our 97 friends? Are there standard BCC rules/limits to the main spam filters out there? Is there anything I can do to minimize the likelihood that our "card" will end up in Junk boxes?

posted by crapples to Computers & Internet (9 answers total) 1 user marked this as a favorite
How often do legitimate emails that you receive include the words "V1agr@", "enlarge", "lesbians"? How often does it happen with spam?

That's very roughly how a spam filter works. It looks at each word of your message, and says "ok, what's the probability that it came from spam?" It has that probability through training - from your past history of calling things spam or not (or, more likely nowadays, by training on a very large set of emails among many users)

So, to minimize your chances of getting caught as spam, speak normally. I wouldn't just leave the body of the email as a link, write a sentence or two and have it in. Reasonably certain that having it as a BCC shouldn't matter, but not positive.

A starting point for info.
posted by Lemurrhea at 8:12 AM on December 6, 2008

Best answer: There's not a lot you can do. It largely depends on the spam filtering capabilities on the receiving end and they can be as varied as the day is long.

Here are my quick recommendations:

97 is not a high number. Known spam hosts can send outwards of millions or even billions of mail per day. If the content of your message is largely text with a youtube link can be reasonably assured that it will go through fine. As this is a season's greeting message, I doubt you'll swear much or talk about viagra. ;)

If you want to know how anti-spam technologies work, keep reading...

Spam filters use a variety of techniques to ascertain the legitimacy of the sender. Each piece of mail is assigned a score and the higher the score (or sensitivity of the receiving host/mailbox setting) determines how it will be treated. This is how they determine the score:
  • Reputation filtering: In reputation filtering, anti-spam companies either form or are affiliated with organizations that analyze web and email traffic. This data is aggregated from various organizations around the world to determine existing or emerging threats. Basically, they tell the spam filter software who is a known spammer and to immediate reject mail from them. It is highly unlikely (depending on your mail host I suppose) that you would need to worry about this level. (Example)
  • Heuristics: This is where it gets complicated. Each anti-spam vendor relies heavily on algorithms that they've developed within their solution to run it through a variety of tests to determine the spam score. This could involve email content from texts to images, number of people in the CC/BCC lists, number of mail hops (number of mail servers involved in transferring the mail) etc. Some vendors are quite good while others are dismal with a lot of in-betweens.
  • Allow/block/grey/white... lists: This is really combined with the heuristics scanning but it deserves special mention. Users can either create their own lists through manual entry or the anti-spam solution becomes aware of known mail senders (through repeat sends, marking something as legitimate if it was incorrectly marked as spam etc.).
There are other technologies involved. From DomainKeys, Sender Policy Framework (tools that vendors can use to ensure that mail sent from that domain only originates from that domain).

This is controversial stuff. It's not unknown for mail hosts to use more than one vendor's anti-spam solution as each one will have strengths and weaknesses that might compliment one another. The downside is that this is often the most aggressive solution that results in the highest number of false positives (legitimate mail marked as spam).

Spam filtering can seem like an annoyance but, believe me, without it email would be completely useless. The mail that ends up in your inbox tagged as spam (usually stored in another folder) is only a tiny percentage (less than 1% on average) of the total amount of spam blocked at the receiving host.

A personal gripe: A lot of organizations consider email "free". In that, it costs them much less to send out marketing material (or soft spam) through email than by snail mail. This might seem true upfront, but it is having a devastating affect on the cost benefits of keeping email even usable. The amount of time and money being spent worldwide to keep email useful is staggering. Always pressure vendors or marketing agents to add unsubscribe options to email. Oh, and those funny forwards that you get from your mother? Ugh.
posted by purephase at 8:44 AM on December 6, 2008

Mail merge?
posted by mandal at 9:20 AM on December 6, 2008

For something important, I would send one at a time, personally, rather than cc'ing or bcc'ing.

Not only will that look better to (some) spam filters, it will give you the chance to personalize the messages so that you're not ACTUALLY spamming people with the same generic message.

It'll make both robots AND people happier that way.
posted by rokusan at 10:07 AM on December 6, 2008

Best answer: I wouldn't worry about your number of recipients or using bcc unless as rokusan says you want to make it more personal.

Re. purephase's heuristics point - think about what spam looks like: highly promotional, attention-demanding, designed to trigger emotional response.

Avoid the use of exclamation marks or all caps anywhere but especially in your subject line. Avoid use of superlatives or even words like "Amazing" in your subject line, anything over the top. Avoid using a lot of HTML formatting like every 10th word is bolded or colored differently. Don't stuff the message with images.

Your approach sounds fine: a low-key matter-of-fact subject line and normally-formatted paragraph or two containing a single youtube link shouldn't trigger any but the most ham-fisted filters.
posted by scheptech at 10:58 AM on December 6, 2008

I believe the number of people on the BCC is irrelevant, since the recipient doesn't see the BCC field. However, the recipient's spam filter may be more likely to flag email that doesn't have the recipient's email address in the to: or cc: field, precisely because it's something spammers can use to hide mass mails.

So what I mean is that bcc-ing to one person and bcc-ing to 100 people will look just as bad. I think.
posted by bsdfish at 11:30 AM on December 6, 2008

Best answer: The best spam filters in use today use Bayesian Filters. The reason those are so useful is that they learn-while-doing. If a spam email contains one word which already has a high spam score e.g. v1agra and several other strange ones, then the email will be tagged as spam and all the other strange ones will get high scores when they're encountered in future.

Some rather curious strings turn out to have very high spam scores in my filter at this point. The string "Mrs" has appeared in 13 spams and no good emails, so it has a score of 99. (It tends to appear in 419's: "I am Mrs Bura Thomas" for instance.) The word "widow" is 4 spams and 0 good, score 99. "sum" is 19 spams, 0 good. "hotmail" is 704 spams, 15 good, so it has a score of 96. (My Bayesian Filter also processes headers.)

Which means that it's not easy for you to predict what will or will not get tagged as spam, because each individual Bayesian filter out there will be different as a function of what's come through it -- and it will keep changing.

That's ultimately a good thing, and it's the reason why spammers really hate Bayesian filters.

But you don't really need to worry too much as long as you avoid acting like a spammer. As mentioned, tone down the enthusiasm, don't use the capslock, and don't overuse the exclamation point, and avoid words you know full well are spammy like names of popular drugs or the word "Nigeria".
posted by Class Goat at 12:03 PM on December 6, 2008

Are you sending this email to people you know? I send out a yearly Christmas email to >100 friends/family. Most of them have my email address, and some spam filters "whitelist" email from people in your address book. I don't think anyone has ever had it marked as spam (I don't tend to write about penises or Viagra, though. Although, one time I had a quote in my email signature that had the word sex in it, and it got in a few people's spam boxes).
posted by bluefly at 12:08 PM on December 6, 2008

By the way, a different consequence of Bayesian filters is that they give very low scores to words found routinely in good email and not in spams. For instance, email addresses of friends usually have scores of 1, for that reason. If these are all people you've sent emails to before (and it sounds like they are) then that will work in your favor. The parts of your real human name will also have extremely low scores if you're in the habit of including it in your email (unless you have a name that gets used a lot in 419 spams like "Bura Thomas").
posted by Class Goat at 12:22 PM on December 6, 2008

« Older Dealing with the Family   |   So… I'm fat! Still wanna meet up? Newer »
This thread is closed to new comments.