A good way to archive and search a large store of email on my computer?
September 17, 2007 6:32 PM

How to store and search lots of old email.

Hi. Maybe the green can help with this.

I'm re-arranging a bunch of old email archives. I'm generally of the "Keep it all" philsophy w. email. And I've got a several years worth of mail now.

What I want to do is keep my current mail (say the past year or so) in my email client (outlook) and online via web (fastmail). But I want to keep all the older stuff separate, in a searchable store. (Because both outlook and Fastmail get unhappy when dealing w. multiple gigs of email, and because I don’t really need emails from 4 years ago popping up every time I do a search)

So…

What I imagine I want is an email client that meets the following criteria:

- Works well w. standard Unix Mbox-type files
- Has indexed search capability
- Works happily with very large store of email (say several gigs)
- Has reasonably powerful capacity for doing advanced searches.
- Bonus points for searching inside attachements
- Bonus points for being free.

I’ve looked at a bunch of possible solutions, none seem right.

Eudora: Doesn’t play well w. standard Mbox files. (Notably – it doesn’t read attachments. I think there are other little problems, too).
The Bat: Recommended for large mailstores. But doesn’t have indexed search
Outlook: Too many problems to list
Thunderbird: No indexed search
Opera M2: Little if any advanced search

I’ve also looked into a couple of programs that aren’t email clients that seemed possibly helpful: ImapSize, and Mailbag Assistant. But they didn’t quite do the job either.

I’m sort of surprised that a program that matches these criteria is so hard to find. Surely I can’t be the only person who wants to be able to store and search a large collection of email?

Anyone have any suggestions?

(Gmail, I guess, might be a possibility, though I’m not sure I can see how that would work. Does anyone use Gmail in this way? Is it possible to upload lots of old email folders to it? )

Alternately – does anyone have other smart ideas/strategies for keeping all your old mail in a way that lets you get at it easily, quickly, without it getting the way of your day-to-day work?
posted by ManInSuit to Computers & Internet (14 answers total) 5 users marked this as a favorite
I've got three solutions to this problem, none perfect.

I've forwarded all my mail to gmail for the past two years and use it for searches. It's great at that. Your initial import is a bit trickier but there must be some solution.

I use grepmail on plain mbox files with some regularity. It's not indexed, but it's fast enough for the occasional "find that old email" problem.

I've built my own quick and dirty mail search program three times; once using Lucene, once using MySQL full text, once using mg. They all worked but not very well, email is just a specific enough data schema you want a really good custom tool.
posted by Nelson at 6:36 PM on September 17, 2007


Mairix alone as a command line tool, or mairix configured as a tool within Sylpheed-Claws, which is also available for Windows.

Evolution does full text indexing natively, which is sort-of available for Windows.
posted by gmarceau at 7:20 PM on September 17, 2007


Start fresh with gmail. Archive all old folders somewhere that you can search, but over time you will move away from it. Once this happens, gmail fits your requirements perfectly.
posted by ets960 at 7:29 PM on September 17, 2007


I sat down one day and took all my old email from my piles of email throughout the year, imported them into Thunderbird, used Thunderbird to move them to an imap server that was also accessible through pop3, then used gmail to pop stuff off (making sure to mark everything unread so gmail pulled it).

Really the only problem with this is that it doesn't know all metric crapton of different addresses that were "me" at various times, so the threading is suboptimal. But the searching is nice. If you are careful with your import, you can even do some simple tagging when during the pop3 pulling.
posted by cmm at 7:30 PM on September 17, 2007


I use grepmail for my MBOX archives (personal mail).

For work email, I save it as PST files, then use LibPST to convert those to MBOX format. I then use either grepmail on the files directly or run them through MHonArc to generate HTML archives of the mail viewable with a web browser.
posted by mrbill at 7:46 PM on September 17, 2007


Nice ideas.

I should have specified I guess that, yeah, I'm on MsWindows (XP).

I'd heard about evolution, but didn't know it was available for Windows now.

The Gmail option sounds cool (if a bit complicated wrt gettting older stuff up there). But is gmail limited to 2.5 gig or so? I actually have *more* email than that....
posted by ManInSuit at 7:48 PM on September 17, 2007


There are paid levels of GMail that allow for more storage:

6 GB ($20.00 USD per year)
25 GB ($75.00 USD per year)
100 GB ($250.00 USD per year)
250 GB ($500.00 USD per year)
posted by heydanno at 8:04 PM on September 17, 2007


For those using gmail as a backup-email-archive-searcher: How do you get your outgoing email in there? Just BCC yourself on everthing? Or something more clever?
posted by ManInSuit at 8:23 PM on September 17, 2007


ManInSuit:

When you use Gmail's SMTP server for your outgoing mail, it awesomely saves that outgoing mail to the Outbox of your Gmail account.

I love that about Gmail!, even though I NEVER use the web interface for my account.
posted by melorama at 8:36 PM on September 17, 2007


I just BCC everything to gmail now.
posted by Nelson at 8:45 PM on September 17, 2007


I'm probably the last person on the planet using Outlook Express, but I have about 200k+ emails and I love the speed. It opens instantly, and searching the entire archive takes about a second. I don't know how it does that. Sorting (from/subject) is instant too.
posted by lpctstr; at 10:13 PM on September 17, 2007


You have not said how your e-mail is stored (locally, I assume) or in what form, but dtSearch can index and retrieve just about anything.
posted by yclipse at 5:30 AM on September 18, 2007


yclipse: My email is stored all over the place right now: Some is on an Imap server, some is archived in Unix-style Mbox files, some is in old copies of Eudora (which is a lot like Unix MBOX, but not quite). But I can bring it all into one format (I'm hoping), which is local Unix MBOX files.

I'll look into dtSearch.
posted by ManInSuit at 9:34 AM on September 18, 2007


Thunderbird won't create an indexed search, but Google Desktop will create one for mail in Thunderbird, if you tell it to. I like Thunderbird because of all of the other features. Haven't needed an indexed search for it yet but Google Desktop is pretty fast at finding what I want in the Thunderbird local mail archives.
posted by caution live frogs at 10:54 AM on September 18, 2007


« Older Pen name paranoia   |   Book Club Bleg Newer »
This thread is closed to new comments.