How to save all cached pages for a particular domain?
June 28, 2007 10:53 PM   Subscribe

A web site with lots of useful information recently went offline, but the cached pages are still available from Google. Is there a program (for either OS X or Windows) that will automatically save all of the cached pages on Google associated with a particular domain?
posted by Ø to computers & internet (8 answers total) 6 users marked this as a favorite
 
Have you tried wget or curl with the Wayback Machine?
posted by Blazecock Pileon at 11:06 PM on June 28, 2007


wget is the way to go.

I love your nick
posted by phrontist at 11:11 PM on June 28, 2007


Note - the Wayback machine doesn't show new pages until many months (6 - 12) later, so you may want to check them again several months from now.
posted by zippy at 1:21 AM on June 29, 2007


Warrick is a command-line utility for reconstructing or recovering a website when a back-up is not available. Warrick will search the Internet Archive, Google, MSN, and Yahoo for stored pages and images and will save them to your filesystem.
posted by blag at 1:41 AM on June 29, 2007 [3 favorites]


Unfortunately the Wayback Machine was blocked by the site's robots.txt. Google has it all cached, though.

I've been reading the documentation for wget (which looks really cool -- thanks for the recommendation!), but can't figure out how to get it to archive just the cached page links returned by Google. I'm probably overlooking something very obvious... any pointers would be appreciated...
posted by Ø at 1:45 AM on June 29, 2007


Wow, blag! Warrick looks perfect!!!! Thank you.
posted by Ø at 1:47 AM on June 29, 2007


I was wondering if there's another program out there like Warrick, or if it's the only one (sorry to derail, but really curious about this one)
posted by Merdryn at 8:01 AM on June 29, 2007


Happy to help.

Merdryn: as far as I know, it's the only one.
posted by blag at 6:09 PM on July 1, 2007


« Older Guitar Filter: On a 6 string ...   |  Help me post personal game sta... Newer »
This thread is closed to new comments.