How to extract contact info from Gmail
November 23, 2020 10:51 AM Subscribe
I run a small nonprofit I am lookin for a utility we can run that will go through the last 7 years of my (and maybe others on my team) work email (in Gmail) and extract name, email, physical address, and phone number for as many people as possible.
I would do some quality control after the fact, so it doesn't have to be perfect. I know I can manually add folks to my google contacts, then export the entire contact list, but 7 years is a lot of email. Anybody know of an app or plugin or other way to make this happen?
Also open to QA best practices on such a task, or any way to add physical addresses for people who are missing that info. Or verifying that John Doe's physical address from six years ago is the same today. We're trying to do a mailing, hence the preference for physical mail.
I would do some quality control after the fact, so it doesn't have to be perfect. I know I can manually add folks to my google contacts, then export the entire contact list, but 7 years is a lot of email. Anybody know of an app or plugin or other way to make this happen?
Also open to QA best practices on such a task, or any way to add physical addresses for people who are missing that info. Or verifying that John Doe's physical address from six years ago is the same today. We're trying to do a mailing, hence the preference for physical mail.
I would start by doing a Google Takeout of your email in the JSON format. That at least gets you searchable text.
Using Visual Studio Code (or similar) you can do regular expression searches for addresses. I think doing this more than semi automatically would be very hard.
BTW, this recent similar question didn't get any complete answers.
posted by gregr at 1:03 PM on November 23, 2020 [1 favorite]
Using Visual Studio Code (or similar) you can do regular expression searches for addresses. I think doing this more than semi automatically would be very hard.
BTW, this recent similar question didn't get any complete answers.
posted by gregr at 1:03 PM on November 23, 2020 [1 favorite]
Go to the Google Takeout link above. Deselect everything. Select the Contacts checkbox. Select the filetype you want (CSV, vCard). Go to the Next Step at the bottom. You can choose .zip or .tgz, to have Google mail you a link, or put it in various cloud places. You can set up recurring downloads or just this once.
When you get the download and poke through it. There's your Contacts information.
It looks like the Email takeout will give you an 'mbox' file. There are dozens of utilities out there to grab email address from this type of file. That could look in the To, From, Cc lines, or some might do that and look through everything else that looks like an email address.
posted by zengargoyle at 2:35 PM on November 23, 2020
When you get the download and poke through it. There's your Contacts information.
It looks like the Email takeout will give you an 'mbox' file. There are dozens of utilities out there to grab email address from this type of file. That could look in the To, From, Cc lines, or some might do that and look through everything else that looks like an email address.
posted by zengargoyle at 2:35 PM on November 23, 2020
Best answer: It sounds like you want to find contact information in the body of your emails, as opposed to contact information that's stored *as contacts* in the Gmail account, is that correct?
What's your budget? Nuix has a feature called "Named Entities" that will locate and extract names, addresses, phone numbers (and other things like account and credit card numbers) from unstructured text using Regular Expressions, but the app costs thousands of dollars and is far from simple to use.
You're also going to get a LOT of false positives for everything other then email addresses, which will require cleaning up by hand, especially if you want physical/mailing addresses, as they have the most variation in style so they have the broadest and most error prone formulas to find them.
I would suggest a fully automated approach is probably going to be more work to implement then it would save in just doing this (somewhat) by hand.
There are lots of tools to extract (e.g. mailextractor) a full list of email addresses from Gmail. I would start by extracting the list of email addresses, then group by domain (as many people at the same company may share mailing addresses) then using that list as a checklist search the email for mailing addresses and phone numbers the old fashioned way, by using keywords and browsing the email oldest to newest when keywords aren't enough.
I would also make sure to search the email for any vCard attachments, as sometimes people like to put their contact information in their signature block this way, and those would be the easiest/fastest to extract in bulk.
posted by tiamat at 11:18 AM on November 24, 2020 [1 favorite]
What's your budget? Nuix has a feature called "Named Entities" that will locate and extract names, addresses, phone numbers (and other things like account and credit card numbers) from unstructured text using Regular Expressions, but the app costs thousands of dollars and is far from simple to use.
You're also going to get a LOT of false positives for everything other then email addresses, which will require cleaning up by hand, especially if you want physical/mailing addresses, as they have the most variation in style so they have the broadest and most error prone formulas to find them.
I would suggest a fully automated approach is probably going to be more work to implement then it would save in just doing this (somewhat) by hand.
There are lots of tools to extract (e.g. mailextractor) a full list of email addresses from Gmail. I would start by extracting the list of email addresses, then group by domain (as many people at the same company may share mailing addresses) then using that list as a checklist search the email for mailing addresses and phone numbers the old fashioned way, by using keywords and browsing the email oldest to newest when keywords aren't enough.
I would also make sure to search the email for any vCard attachments, as sometimes people like to put their contact information in their signature block this way, and those would be the easiest/fastest to extract in bulk.
posted by tiamat at 11:18 AM on November 24, 2020 [1 favorite]
Best answer: I used Evercontact for a while and it worked well. For $5 per month, per account, it will strip contact data from incoming email and keep your address book up to date in Gmail. I've found that Apple's iOS has done a pretty good job that this wasn't worth $60 per year to me. I just got a Black Friday email for them and they are advertising Contact Rescue, which looks like it would do what you want.
posted by ajr at 12:29 PM on November 27, 2020 [1 favorite]
posted by ajr at 12:29 PM on November 27, 2020 [1 favorite]
Response by poster: Contract Rescue did what I wanted it to, or as close as possible given current tech. Also flagging tiamat's answer as best because of the advice on how to sort and clean the data.
posted by postel's law at 8:36 PM on December 23, 2020
posted by postel's law at 8:36 PM on December 23, 2020
This thread is closed to new comments.
posted by jon1270 at 12:39 PM on November 23, 2020