How to extract the addresses of a couple hundred bounced emails?
January 23, 2013 8:22 AM   Subscribe

I have been running a couple of email announcement lists for a while, and want to clean out all (or most) of the addresses that are bouncing. What is a good way to do this?

I have a couple of email lists I use to promote local events. Everyone on them has asked to be on them, and, of course, unsub requests are processed promptly.

What I haven't been processing, though, are bounces. Every time I send an email to one of these lists, I get a lot of emails bouncing back. In general, this has not been a problem for me, so I haven't been deleting the bounced addresses from the list. (I hope this isn't a terribly irresponsible practice. I have a feeling you're not supposed to do this, though I'm not totally sure why.)

Now I'd like to clean up the lists, largely because I'm in the process changing how I send mails, probably to use Mandrill (previously), and I realize having a lot of bad addresses in my list is not something bulk mail services like.

I wonder if there's an easy way for me to get a list of the addresses that should be removed from the list. I have a this of information to work with:

a) For one of my lists: several months worth of old "bounce" emails all in a single folder on my IMAP server (fastmail) and in my email client (thunderbird)

b) For my other list: A few months of old bounce messages are all in a single folder in my gmail account

Is there some piece of software or service that can parse these messages into some sort of easy to work with form, and give me a list of what addresses produced what sorts of bounces, and how often? In each case: The lists have something like 3000 addresses, and I think about 10% of the addresses are bad. I'm hoping for something free or cheap. (Less than, say, $50 for a one-time sweep of the bounces...)

(Less critically: I also have a couple of smaller, older lists, where I don't have current bounce messages. That is- I just have a list of email addresses. Is there any way to check whether those email addresses are still valid other than sending messages to those addresses?)

Any information would be much appreciated. Thanks!!!
posted by ManInSuit to Computers & Internet (8 answers total) 1 user marked this as a favorite
I'd probably export them all to a big text file and run some search/replace on them in a text editor to winnow them down. I imagine there's a predictable bit of text that precedes the bad email address. One bounce in my inbox looks like this:

Delivery to the following recipient failed permanently:

Technical details of permanent failure:
DNS Error: DNS server returned answer with no data
I'd run a few s/r to get rid of all the headers, then gradually massage it into a form that I could bring into Excel, maybe.
posted by chazlarson at 8:44 AM on January 23, 2013

chazlarson - My sense is that because there's no one standard way that email bounce messages are written, it's not totally trivial to process bounce messages (especially if you want to distinguish "hard" bounces, where the address is definitely bad, from "soft" bounces, where the message indicates, say, a temporary problem with the server...) I expect that I *could* do it myself by playing with the text, but I'm hoping there's a pre-existing solution that could be less work and more accurate.
posted by ManInSuit at 9:13 AM on January 23, 2013

It has been my experience with newsletter services (granted they were not like Mandrill) that bad email addresses are not an issue. They typically have functionality to cull your lists for you--hard bounces are indicated and removed from the list immediately and soft bounces are removed after three tries or whatever their business rules dictate. These business recognize that not everyone has the resources to do provide clean lists and this is part of the benefit of their software.

Also, it looks like Mandrill has this issue covered. Obviously I'd check with them to make sure this won't be a problem or what the implications of having a certain percentage of bad emails would be.
posted by Kimberly at 9:22 AM on January 23, 2013

Kimberly - Yes! It looks like Mandrill can help get rid of the bounces. My concern is that Mandrill's service uses a reputation system, which considers a low bounce rate one of the criterion of a good reputation. So my thinking had been that it's better to start off without a whole bunch of bad addresses. But maybe it's okay to start off with a lot of bounces, and let my reputation improve?

Asking them about this is a good idea!
posted by ManInSuit at 9:28 AM on January 23, 2013

If this is a one-time thing, you might think about hiring a high school student for $10 per hour to manually go through the folder of bad addresses and remove each address from the list. 10% of 3000 is 300 bad addresses; you could probably do 50-100 in an hour.
posted by CathyG at 10:12 AM on January 23, 2013

As noted upthread, it really depends on whether there is a commonality with the bounced messages in some form. If there is, it can be used as a tag to mark those entries so it can be filtered after extraction.

There are certainly extractors that will pull addresses out of any text string, the key is the syntax or format of the bounce notation. However, without seeing the actual data, I cannot give you a specific direction to follow. If you want, you can mail me a portion (say about 50 or so) and i can take a look.

This really should not be a huge issue to solve, it just depends on the particulars of the data.
posted by lampshade at 1:09 PM on January 23, 2013

I found a program called eMail Bounce Handler which seems okay. I downloaded the trial - it seems to work. I think I'll pay the $30 for the full version and give it a shot...
posted by ManInSuit at 11:01 AM on January 24, 2013

eMail Bounce Handler worked great. It was actually $25 for the full version, not $30 as I mentioned above. Downloading it, getting my mail into it, and getting out a list of bounces I could use took maybe a half hour at most. It does some pretty smart stuff - differentiating between different types of bounces, telling my how many times each address bounced, etc. I'm happy with the result.
posted by ManInSuit at 7:53 PM on February 1, 2013

« Older Help identifying two Southern CA trees?   |   How come these instances of modus tollens seem so... Newer »
This thread is closed to new comments.