extracting email from yahoo
March 28, 2007 10:57 PM Subscribe
I need to extract a few thousand customer e-mail addresses that are mixed in with tons of spam in my yahoo mail and store them in a program/database. I'm not sure where to start on this, is there a program, script, or relatively painless method on how to do this? thanks!
Hacky spam-filtering method, put together off the top of my head:
1) set up thunderbird on your computer
2) set up a filter that will forward all email to a gmail account as it is received.
3) use POP to download all your yahoo mail, which will then be forwarded to your gmail account. Gmail has the best spam filters in the business, as I understand it.
4) Set up another account in thunderbird that will pull all of the now-spam-filtered mail back down from Gmail.
5) from there, you can use grep or some other little script to extract all of the addresses (see this previous thread for one example.
No spam filter is perfect, so at some point, you're still going to have to scan all of these manually, weeding out false positiveas and false negatives.
Alternately, you could install spamassassin or another filtering program yourself. Setting that up may take a bit more technical know how, though.
4) set up another thunderbird account to download all of the emails that don't get filtered out from the gmail.
posted by chrisamiller at 12:05 AM on March 29, 2007
1) set up thunderbird on your computer
2) set up a filter that will forward all email to a gmail account as it is received.
3) use POP to download all your yahoo mail, which will then be forwarded to your gmail account. Gmail has the best spam filters in the business, as I understand it.
4) Set up another account in thunderbird that will pull all of the now-spam-filtered mail back down from Gmail.
5) from there, you can use grep or some other little script to extract all of the addresses (see this previous thread for one example.
No spam filter is perfect, so at some point, you're still going to have to scan all of these manually, weeding out false positiveas and false negatives.
Alternately, you could install spamassassin or another filtering program yourself. Setting that up may take a bit more technical know how, though.
4) set up another thunderbird account to download all of the emails that don't get filtered out from the gmail.
posted by chrisamiller at 12:05 AM on March 29, 2007
aarrgh - ignore that last #4 - it's late and 'm tired...
posted by chrisamiller at 12:06 AM on March 29, 2007
posted by chrisamiller at 12:06 AM on March 29, 2007
Response by poster: looks like ypops thunderbird is the best bet. thanks!
If anyone else has a suggestion I'd love to hear it.
posted by debu at 12:36 AM on March 29, 2007
If anyone else has a suggestion I'd love to hear it.
posted by debu at 12:36 AM on March 29, 2007
You've probably figured this part out already, but I'll comment in case it hadn't occurred to you.
I do a lot of this sort of stuff at work, lucky me, and the first thing I do is sort the emails alphabetically by title. A huge number of spam titles are identical, and when sorted by title they'll stand out immediately. You can then delete all the duplicate titles about weight loss and hot chicks and whatever else, which will dramatically reduce the number of emails left to sort by hand.
posted by different at 7:47 AM on March 29, 2007
I do a lot of this sort of stuff at work, lucky me, and the first thing I do is sort the emails alphabetically by title. A huge number of spam titles are identical, and when sorted by title they'll stand out immediately. You can then delete all the duplicate titles about weight loss and hot chicks and whatever else, which will dramatically reduce the number of emails left to sort by hand.
posted by different at 7:47 AM on March 29, 2007
« Older Does anyone know of an audiophile/hi-fi club in... | Best cell service option whilst living overseas? Newer »
This thread is closed to new comments.
posted by Rhomboid at 11:56 PM on March 28, 2007