How can I export a list of names from my spam?
January 18, 2010 8:06 PM   Subscribe

Help me make the most of spam! No, I don't want to send it. I want to receive it - and have some fun with it. And I need your help, especially if you know how to export bits of email.

A long time ago, I noticed a great looking name in the sender section of a piece of spam email. This, of course, led me to start saving spam names for use when writing fiction. Eventually, I created a writing prompt generator [link in profile] that spits out a random name, a noun and a verb.

Here's where I could really use some knowhow: There has to be a better way to gather spam-names than what I'm doing.

Currently, I'm cutting the names from my gmail spam and pasting them into a text file (OS X). It works, but it's time consuming because I wait until I have 50 or so good names... and then, I cut/paste/cut/paste, rinse lather repeat.

Is there a way for me to export only the SENDER portion of a list of emails so I'd end up with a list of the names?

And if there is... is there a strategy for creating an email account specifically to have it hammered by spam? Is there a strategy for attracting a better caliber of spam (if there is such a thing?). In other words, I want more spam from Lisa West, Eddie Rankin and Rae Finch... and less email from ViagraSuperstore and BigWeeWee4U!
posted by 2oh1 to Computers & Internet (15 answers total) 1 user marked this as a favorite
 
I don't think there's a way to be selective about it; if you're gonna get spam, you're gonna get all kinds of spam. If you want to increase the volume of incoming spam, you should post <a href="mailto:email@address.com">email links</a> all over the internet and see how many bots scrape it. Just watch out that you don't become a spammer yourself, posting the address in places no one wants it.
posted by The Winsome Parker Lewis at 8:13 PM on January 18, 2010


Is there a way for me to export only the SENDER portion of a list of emails so I'd end up with a list of the names?

I haven't tried this, but it should work. If you install Thunderbird and configure it to access your Gmail account via IMAP, then open the spam folder from inside Thunderbird, it will download all the spam mail headers. Thunderbird keeps all its incoming mail in mbox files, and mbox is a plain text format, so once you've found the mbox corresponding to the spam folder, it should be easy to extract the sender info with grep.

Post back if there are parts of that you don't know how to do.
posted by flabdablet at 8:32 PM on January 18, 2010


Response by poster: Well, my real priority is figuring out how to export just the sender names from a list of spam. The rest was just silly wishful thinking, I admit.
posted by 2oh1 at 8:32 PM on January 18, 2010


Response by poster: "it should be easy to extract the sender info with grep"

Um.... huh? You lost me with that last bit. Otherwise, your solution sounds great. Also, is there a way to do this using Apple's Mail app?
posted by 2oh1 at 8:34 PM on January 18, 2010


Apple Mail also uses plain text mail files, but if I recall correctly, it keeps one file per message instead of smooshing an entire folder's worth into an mbox. I don't know it well enough to be sure of how it deals with IMAP and mail headers. If we're lucky, it will download one set of headers per available message as soon as you connect to an IMAP folder, and grep should work OK with that arrangement as well.

As for what grep is: it's a venerable Unix text processing tool, and it's already installed in your Mac.

How familiar are you with entering commands in Terminal?
posted by flabdablet at 9:19 PM on January 18, 2010


Response by poster: "How familiar are you with entering commands in Terminal?"

Vaguely. If I have enough clues to go by, I can usually figure it out, but I haven't used terminal much at all.
posted by 2oh1 at 10:23 PM on January 18, 2010


This is the most interesting spam question I've ever heard. Thank you for improving the quality of my thoughts on spam.

Previous suggestions of grep are absolutely correct. Mail.app apparently uses a proprietary emlx format. Forgive the guess, but it sounds like you're not a programmer. If you could post a link to an example of an emlx format (or upload a few of your own from /Users/yourusername/Library/Mail), I'd be happy to write you a script.

As for getting high-quality spam, the best strategy would probably be to invite lots of spam and filter out what you don't want. There are lots of prepackaged Bayesian implementations for this, but they might be optimized to filter on things like content rather than the sender's name. You'd probably be better off with some dumb rules that eliminate, say, mixed letters and numbers, more than three capitals in a row, etc. Although in my experience, the various kinds of bounced-check, wire transfer, etc. scams use more human-sounding names than the drug and pornography offers.

By the way, if you are not familiar with a command-line interface, please be careful in Terminal. Find out what man pages are, and then make a point of reading the relevant man pages before executing unfamiliar commands.
posted by d. z. wang at 12:31 AM on January 19, 2010


Interestingly, Mail.app keeps an sqlite database of all of its messages, in a file called Envelope Index. It's not too hard to come up with an sql query that'll give you the names (technically, the comment part of the sender's email address) of all the messages in a given mailbox.

Open Terminal, and type the command cd Library/Mail to go to the directory Mail keeps its stuff in, and sqlite3 'Envelope Index' to open the envelope index file using the command-line sqlite tool. You'll get a sqlite> prompt from here on instead of the usual shell prompt.

Type the command select * from mailboxes; to list all the mailboxes Mail has indexed. (Note that all sqlite commands have to end with a semicolon.) The numbers on the left are the number Mail has assigned to that mailbox for its internal purposes. Find the number of the mailbox you keep all your interesting spam in. Let's say that number is 42. Then type select distinct a.comment from addresses a, messages m where m.mailbox = 42 and a.rowid = m.sender; to get a list of all the names. Copy-and-paste to wherever you like.

When you're done hit ctrl-D a few times and/or just close the window.
posted by hattifattener at 12:54 AM on January 19, 2010


dzw, emlx is basically just a one-message mbox with a bunch of xml cruft stuck on the end, so grepping a bunch of them for ^From: should work just fine. The only question in my mind is whether mail.app actually makes .emlx files when it's downloading IMAP headers; I'm a Linux guy, not a Mac guy, so I don't have the equipment to find out (and I've just found, to my dismay, that Thunderbird doesn't do it either).

If it turns out not to, an alternative approach would be to whip up a little script that talks directly to Gmail's IMAP server using openssl s_client -quiet -connect imap.gmail.com:993 and grabs the appropriate headers. I believe openssl comes pre-installed on Macs as well; could you check?
posted by flabdablet at 12:56 AM on January 19, 2010


Ooh, I think hattifattener just won the thread.
posted by flabdablet at 12:57 AM on January 19, 2010


Best answer: Install Table2Clipboard, go to your spam folder, select part of the table, Ctrl-click to get the context menu and "Copy whole table". Paste into your favourite spreadsheet. Zhi Wong, Marni Dow, Joseph Fletcher and all their friends will be waiting for you in the third column.
posted by scruss at 4:03 AM on January 19, 2010


Response by poster: Scruss! That's EXACTLY the sort of thing I'm looking for, but I can't quite get it to work.

I installed Table2Clipboard. I copied the table.
When I paste it into Numbers (iWork spreadsheet), it drops everything into one column rather than into separate columns across the table. What am I doing wrong?
posted by 2oh1 at 12:33 PM on January 19, 2010


Response by poster: Weird. I can paste the table into Pages, but not Numbers. (I hate that iWork uses wuch generic names for the apps since it makes Googling for info so much harder).

Aaaaaaanyway... by following the suggestion scruss posted (but using a word processor instead of a spreadsheet), I got exactly what I want, and it's really easy now. Thanks!!!
posted by 2oh1 at 2:42 PM on January 19, 2010


And the moral of the story is to consider your audience. Wow, we fail. Scruss, thank you for demonstrating why vim will never become a mainstream text editor.
posted by d. z. wang at 9:51 AM on January 20, 2010


Response by poster: No, no no. You didn't fail at all. You got me thinking in completely different directions for a solution to my problem, and I appreciate that.
posted by 2oh1 at 10:35 AM on January 20, 2010


« Older Help me beat Depression without meds   |   Can you hear me now? Newer »
This thread is closed to new comments.