Quick, Dirty & Cheap method to extract email addresses from a website?
July 1, 2013 7:37 AM Subscribe
I would like to create a list of emails that are publicly available on websites.
Not "all websites ever". I plan to send an email, inviting people who have a particular kind of job, to take an online survey. Their work emails are publicly available on their employers' websites. (I will have an ethics review and get approval from an institutional review board before sending out the invitation.)
However, I have more than 200 websites to visit. I do not want to cut and paste more than 3000 email addresses into an Excel spreadsheet or ACCESS database.
I don't think I'll even need names matched with emails -- although it'd be nice to have it from the get-go, in case it turns out that I do need them later.
A quick Google search tells me there are extraction services/software that do this -- but I don't really know what I'm looking at or what I'd be buying. It sounds like there are a lot of options for extracting all kinds of data on a much grander scale than what I hope to do. (And I don't want a spammy company to start extracting non-publicly-available information from my computer.)
Does anyone have any advice or experience with this?
I don't think I'll even need names matched with emails -- although it'd be nice to have it from the get-go, in case it turns out that I do need them later.
A quick Google search tells me there are extraction services/software that do this -- but I don't really know what I'm looking at or what I'd be buying. It sounds like there are a lot of options for extracting all kinds of data on a much grander scale than what I hope to do. (And I don't want a spammy company to start extracting non-publicly-available information from my computer.)
Does anyone have any advice or experience with this?
In crude terms, you're looking for the bit in a web page source that comes after a mailto: and before its closing ". It's a very small matter of scripting.
Even with an ethics review, this can be a minefield. Hit just one honeypot e-mail address, and your institution could be blacklisted for spamming. Ask your institution's sysadmin for advice.
posted by scruss at 8:21 AM on July 1, 2013 [3 favorites]
Even with an ethics review, this can be a minefield. Hit just one honeypot e-mail address, and your institution could be blacklisted for spamming. Ask your institution's sysadmin for advice.
posted by scruss at 8:21 AM on July 1, 2013 [3 favorites]
One thing to take into consideration is to partner with an organization to whom most of these people belong, and work out a way with them to use their opt in list collaboratively as a one-off, sharing the data back with them, benefiting them and you.
Also, include in the survey a way for people to opt in for more information from your organization and/or include a "forward to a friend" ability.
I get these quite a lot. "Company XYZ, in coordination with Organization ABC, would like to invite you to take a short survey and enter in a chance to win a gift card for [nominal amount]. You can help further the direction of [industry] by [completing survey] making your voice heard."
posted by tilde at 1:44 PM on July 1, 2013
Also, include in the survey a way for people to opt in for more information from your organization and/or include a "forward to a friend" ability.
I get these quite a lot. "Company XYZ, in coordination with Organization ABC, would like to invite you to take a short survey and enter in a chance to win a gift card for [nominal amount]. You can help further the direction of [industry] by [completing survey] making your voice heard."
posted by tilde at 1:44 PM on July 1, 2013
Response by poster: Thanks, scruss, for the heads up on honeypot email addresses. I checked with my IT guy and he wasn't too concerned. He offered to help -- but in the end, it was easier to just check each website and make note of the emails I wanted.
tilde -- your advice is also good -- but it would wreak havoc with my denominator and sampling strategy.
posted by vitabellosi at 12:33 PM on August 25, 2013
tilde -- your advice is also good -- but it would wreak havoc with my denominator and sampling strategy.
posted by vitabellosi at 12:33 PM on August 25, 2013
This thread is closed to new comments.
I've used this one before. I think that a lot, if not all, web scraping tools require you to have some programming experience.
posted by dfriedman at 7:41 AM on July 1, 2013