extracting email from yahoo
March 28, 2007 10:57 PM   Subscribe

I need to extract a few thousand customer e-mail addresses that are mixed in with tons of spam in my yahoo mail and store them in a program/database. I'm not sure where to start on this, is there a program, script, or relatively painless method on how to do this? thanks!
posted by debu to Computers & Internet (5 answers total)
 
You can use YPOPs! to access your Yahoo account from any program that speaks POP. You could e.g. transfer all the mail over into Thunderbird and go from there. I can't tell if you just want a list of all email addresses or if you need to go through and sort them by hand, but it's a starting point, and it at least lets you use any POP-speaking software.
posted by Rhomboid at 11:56 PM on March 28, 2007


Hacky spam-filtering method, put together off the top of my head:

1) set up thunderbird on your computer

2) set up a filter that will forward all email to a gmail account as it is received.

3) use POP to download all your yahoo mail, which will then be forwarded to your gmail account. Gmail has the best spam filters in the business, as I understand it.

4) Set up another account in thunderbird that will pull all of the now-spam-filtered mail back down from Gmail.

5) from there, you can use grep or some other little script to extract all of the addresses (see this previous thread for one example.

No spam filter is perfect, so at some point, you're still going to have to scan all of these manually, weeding out false positiveas and false negatives.

Alternately, you could install spamassassin or another filtering program yourself. Setting that up may take a bit more technical know how, though.

4) set up another thunderbird account to download all of the emails that don't get filtered out from the gmail.
posted by chrisamiller at 12:05 AM on March 29, 2007


aarrgh - ignore that last #4 - it's late and 'm tired...
posted by chrisamiller at 12:06 AM on March 29, 2007


Response by poster: looks like ypops thunderbird is the best bet. thanks!
If anyone else has a suggestion I'd love to hear it.
posted by debu at 12:36 AM on March 29, 2007


You've probably figured this part out already, but I'll comment in case it hadn't occurred to you.

I do a lot of this sort of stuff at work, lucky me, and the first thing I do is sort the emails alphabetically by title. A huge number of spam titles are identical, and when sorted by title they'll stand out immediately. You can then delete all the duplicate titles about weight loss and hot chicks and whatever else, which will dramatically reduce the number of emails left to sort by hand.
posted by different at 7:47 AM on March 29, 2007


« Older Does anyone know of an audiophile/hi-fi club in...   |   Best cell service option whilst living overseas? Newer »
This thread is closed to new comments.