Help me clean an email list?
June 24, 2009 11:34 AM   Subscribe

What is the best way to clean this mailing list of bad email addresses? Difficulty: No address cleaners, but I do have a snapshot - the results of sending a newsletter to the list, and the resultant bounce reports. I have this information in several forms..

in the webmail inside a standard cpanel; in the sent files and incoming files in Thunderbird; and I also have the bounce reports in my gmail. I also have an excel file of the whole list I used.

I emailed the list in chunks of 50, from Thunderbird.

I am looking for a process that would be easiest, a full walkthrough would be best - I have full office suite and a technical husband that can use scripts and stuff.

Many thanks!
posted by By The Grace of God to Computers & Internet (4 answers total) 1 user marked this as a favorite
 
Personally I'd write something in Perl to pull the bounces from Gmail, extract the addresses and use an Excel module to delete them from the list. Going forward, I would use a dedicated From: address for the newsletters so that the script could be run periodically to pull bounces from the mailbox and re-update the Excel file.
posted by rhizome at 12:26 PM on June 24, 2009


Hmm, let's say you could get two pieces of information into Excel: The mailing list of all addresses in one column, and the list of all the bad addresses in another column. Then you could use VLOOKUP to pretty easily identify which of the mailing list addresses are present in the bad addresses column, and mark them as such. Then you could use AutoFilter to extract out just the good email addresses.

The tricky part is getting the column of bad email addresses. I don't really know Thunderbird or anything that well, but if you could somehow end up with a text file that had all of the bad addresses in it then I could help you write a script that will extract out just the email addresses from all the text.
posted by losvedir at 12:51 PM on June 24, 2009


I should add, this is exactly the sort of thing I like to do for fun, so feel free to me-mail me and I'd be happy to work through it with you.
posted by losvedir at 12:53 PM on June 24, 2009


Note: We solved this through MeFi Mail, and I thought I'd post the steps taken in case someone in the future searches for a similar problem. Here's what we did: (we were using OS X)

1) In Thunderbird create a Local Folder called "failedrecipients". You can do this by right-clicking the "Local Folders" icon in the left panel and clicking "New Folder...". Give it the name "failedrecipients" (without the quotes), and make sure you "create as a subfolder" of Local Folders.

2) Put all of the bounces in this folder.

3) Navigate to this folder in the terminal: ~/Library/Thunderbird/Profiles/xxxxxxxxx.default/Mail/Local\ Folders/

Type "ls" to see a listing of files in this folder. There should be one called "failedrecipients", which is just a text file with all of the bounced messages in it. We can extract out all the bounced email addresses using grep, and save it to a new file on the Desktop:

4) Type this into the terminal:

grep X-Failed-Recipients failedrecipients > ~/Desktop/failedrecipients.txt

What this will do is search the failedrecipients text file for lines that contain the X-Failed-Recipients flag, which gmail puts in bounced messages. It will then save all of those lines (which contain the bad email addresses) to a file on the desktop called "failedrecipients.txt".

5) Go to the desktop and make sure file is there. Open it with Text Edit, and you should see a bunch of lines that looks like:

X-Failed-Recipients: john@idoubtverymuchthisdomainexists.com
X-Failed-Recipients: robert@probablynogoodeither.net
etc....

Now it's time to get out just the email addresses from there. We could probably use the 'grep' command again to get them out, but rather than spending a few minutes figuring out how to do the command just right, let's open Excel since we'll have to eventually anyway.

6) Open the text file and copy all the text in it, click the cell A1 in Excel, and then paste. If done right, you should have a long column with each "X-Failed-Recipients: address@blah.com" in a different row.

7) We can split this into two columns using the "Text to Columns..." command. So highlight the column of text, then go to Data -> Text to Columns...

8) It will ask you how to break apart that text into multiple columns. Make sure "Delimited" is selected and click Next. I'm not entirely sure what your text file (and hence, your column of text in Excel) looks like, but I'm hoping you can break it apart at the colon or space. Try checking the "Space" box, checking "Other" and typing a colon, and then make sure "Treat consecutive delimiters as one". Click Finish.

9) With any luck, you have one column with just "X-Failed-Recipients" in it going down, and a second column with just email addresses. You can delete the first column, leaving behind the email addresses.

10) Now we'll make a new column alongside the email addresses, consisting of just the word "bounced" all the way down. So if your email addresses are in the A column. Type "bounced" in B1, and drag the corner down until the whole column is filled up, so that each email address in A has the word "bounced" to the right of it.

11) This will be the lookup table of bad email addresses. Highlight all of this data, from your first email address in A1 to the very bottom "bounced", so you have a highlighted rectangle 2 columns wide and however many rows long as there are bad email addresses.

12) We will name this data so it's easier to deal with. With it highlighted, go to Insert -> Name -> Define, and give it the name "BadAddresses" (without the quotes). Now Excel knows this data as "BadAddresses".

13) Now type "All Email Addresses" in D1 and "Bounced?" in D2. Make these two cells bold (Excel "understands" data and can work with it better if the headers are formatted differently from the text underneath it.)

9) Underneath the "All Email Addresses" cell paste in your list of ALL email addresses in the mailing list. You should now have a really long column D.

Note that both this email list and the BadAddresses list should have just the email address text and not be actual links that you can click. If Excel turned either of them into links, try copying all of them, and then going Edit -> Paste Special -> Values back into the same place. The colors may remain, but the link part should be gone.

10) Now the magic begins. We will write a formula in E2 which takes the email address in D2 and looks in BadAddresses to see if it's there. If it is, it will return the value in the 2nd column of BadAddresses which is "bounced", and if it isn't, it'll throw up its hands and say "#N/A". The formula in E2 is this:

=VLOOKUP(D2,BadAddresses,2,FALSE)

I don't know exactly what the FALSE is about, but it's needed in the formula.

11) Hopefully, E2 will now say either "#N/A" or "bounced" depending on whether D2 is a good address or not! Now, click E2 and drag the bottom right corner all the way down, so Excel has a chance to look up all the addresses.

12) Now the E column should have a bunch of "bounced" and a bunch of #N/As. Highlight all the data in columns D and E, and go to Data -> Filter -> AutoFilter. Little arrows should appear on the right side of cells D1 and E1.

13) Click the arrows in E1 and a little menu should pop up asking what you want to filter by. Select "#N/A". It should filter down the list to just the email addresses with "#N/A" next to them, i.e. the good email addresses!

14) Highlight all these addresses, copy them, and then paste them wherever you'd like!

And that's it. It worked for us, and hopefully will for anyone in the future who may stumble upon this thread.
posted by losvedir at 3:40 PM on July 1, 2009


« Older Ethical obligation to pay for kicked out...   |   How do I set up an e-commerce site that accepts... Newer »
This thread is closed to new comments.