How do I find all references to a certain domain on my company's website?
September 26, 2007 9:07 AM   Subscribe

How do I find all references to a certain domain on my company's website?

My company is changing email addresses in a couple of weeks from user@mycompany.org to user@mycompany.com. We have a really large web site that has references to the former address on many pages scattered all over the site. I need a way to find all of these references and then generate a report that I can give to our site editors so they can update those pages before the change. I don't need to be able to automate making the change as we have many site editors who are responsible for only their department which would probably only contain a couple dozen references to the old domain.

I tried several of the free link checking packages out there but none of them seem to allow searching for a specific word or domain, only checking for broken links (which, of course, is what they're meant to do). I would be willing to pay a few hundred bucks for the right software but would obviously prefer freeware. I would like to be able to provide this info to my site editors as there's already major grumbling about the email address change.

PS - I realize I could just set the new email address as primary and leave the old address but we chose to stop accepting mail for the old domain for various reasons.
posted by bda1972 to Technology (14 answers total)
 
You could use Dreameaver's find and replace (now with regular expressions) to do this easily.
posted by ReiToei at 9:10 AM on September 26, 2007


Dreamweaver, rather.
posted by ReiToei at 9:11 AM on September 26, 2007


If you have access to a Unix/Linux/ Mac OS X machine, you could do some bash scripting to make it happen.

This will replace user@mycompany.com with user@mycompany.org in all html files under the current directory.
find -iname "*html" -exec sed -i 's/user@mycompany.com/user@mycompany.org/' {} \; 
It should run under cygwin for windows, but that's a bit silly for one search and replace.
posted by Skorgu at 9:20 AM on September 26, 2007


Are these pages all static files, or are they generated somehow (perhaps from a database)?

If they are static and you're on a unix server, you could use grep. If the content is in a db, you could use a sql select statement.
posted by and hosted from Uranus at 9:21 AM on September 26, 2007


Text Pad can search for expressions in a directory. It can return either a list of files, or the specfic lines in which a phrase appears.
posted by clarkie666 at 9:23 AM on September 26, 2007


Here is a windows version of grep. Install it, and run "grep -rin "user@mycompany.com" *" The flags there mean "recursive" (everything under the current directory), "case insensitive", and "print the line number".
posted by cschneid at 9:31 AM on September 26, 2007


It's not the "best" approach but it could be the easiest/cheapest.. so did you try Google or one of the other search engines yet? :)

Google should work with a.. site:yourdomain.com email@address.com then you'll need to click the link to show all similar results.
posted by wackybrit at 9:35 AM on September 26, 2007


perl -i.bak -p -e 's/foo/bar/' *.html

Searches all files in the current directory ending in '.html', replaces 'foo' with 'bar' and backs up the original as .bak.

On preview, the abovementioned grep will do it nicely, if all you want are the locations of the strings.
posted by jquinby at 9:37 AM on September 26, 2007


BK ReplaceEm is free and really easy to use.
posted by likedoomsday at 9:37 AM on September 26, 2007


seconding wackybrit here. replace my company with what ever it is called
posted by DJWeezy at 9:42 AM on September 26, 2007


Skorgu has the best suggestion, if you have a Unix-like system (except the search/replace pattern is transposed.)

Here’s what I would do to get a simple listing of the files:

find /path/to/html/dir -type f -exec grep -li '@mycompany.com' '{}' \;

If you do want to automate the change, this would work:

find /path/to/html/dir -type f -exec perl -i -pe 's/@mycompany.com/@mycompany.org/ig' '{}' \;

Many times, email addresses are encoded as HTML character entities in order to obfuscate them and prevent spammers from harvesting them. If this is the case, the simple pattern match given above will not work.
posted by breaks the guidelines? at 9:54 AM on September 26, 2007


dammit, and I went and transposed the search/replace pattern, too!
posted by breaks the guidelines? at 9:55 AM on September 26, 2007


Don't write a report -- just tell the editors to replace all instances of x with y across the domain. If they don't know how to do that, they shouldn't be in charge of your website.

If they nevertheless remain in charge of your website, tell them to install Arachnophilia, a free HTML editor that runs on Windows, Linux, or Mac (it's written in Java). They can use its simple but powerful find-and-replace tool to fix your website in less than thirty seconds.
posted by gum at 9:59 AM on September 26, 2007


In addition, make sure you set up a 301 "Moved Permanently," redirect.
posted by Good Brain at 10:45 AM on September 26, 2007


« Older Need help pricing an antique table   |   Everybody knows this is nowhere Newer »
This thread is closed to new comments.