Retrieve multiple email contents
April 11, 2011 6:33 AM   Subscribe

How might I retrieve the text contents of c. 6,000 emails, and put each one into a separate field or document?

I have almost 6,000 emails on a webmail account, and I need to present the contents of each one in a long (very long) list arranged by sender. None are more than 50 words or so, and they are all text-only. Is there a way to automate this process, so I can avoid opening each one and copying and pasting into (for example) a spreadsheet?
posted by tawny to Computers & Internet (6 answers total) 2 users marked this as a favorite
First step is getting the data. What mail service are you using? Can you get the messages via POP or IMAP?

I'm not sure what the best way to deal with it from there is. It's relatively simple to parse text like this with any scripting language, but I don't know of any software with a GUI that can do this sort of thing.
posted by pjaust at 6:43 AM on April 11, 2011

Someone familiar with html parsing could probably do this fairly easily for any give webmail service.

What webmail service is this? Gmail can be configured into POP or IMAP mode, then you could download them to a local mail client and extract from there.
posted by atrazine at 6:44 AM on April 11, 2011

If you can download all the mails, you can use this to convert them into a text file, then parse that to get the messages.
posted by atrazine at 6:49 AM on April 11, 2011

If you have POP or IMAP access, some mail clients use a DB behind the scenes to store your mail. MailForge uses a SQLlite DB that you can access locally.

If you can get your provider to give you the mail directly from the server, it's likely to either be in mbox or maildir format. mbox would need to be parsed, but maildir is already a series of files. Ask the mail provider if they can give you the mail in maildir format.
posted by Mad_Carew at 8:10 AM on April 11, 2011

Export to dbx format, and figure out a way to parse and re-import into your database of choice. The format is dead simple, I believe messages are delimited by a blank line, then a period, then another blank line.
posted by gjc at 6:40 PM on April 11, 2011

I've used MailSteward to go form IMAP to CSV to Excel or SQL.
posted by Brian Puccio at 8:00 PM on April 12, 2011

« Older Cooking Lesson for Paris Tourists   |   Which magsafe should I get? Newer »
This thread is closed to new comments.