How do I download an entire website from Google's cache?
September 23, 2010 3:48 PM   Subscribe

How do I download an entire website from Google's cache?

There used to be an amazing website that featured video game reviews, debates, tributes, and experiences. This was everything a video game website should be, but it was the only one of it's kind.

It was ran by one guy, and it was updated from the late 90s to 2005. The site was abandoned, but it was able to stay online until a few weeks ago. The webmaster's ISP finally wiped it off their server.

The site now only exists as a part of Google's cache. I loved this place, it means so much to me, and I'd love to have it permanently archived on my own computer. The entire thing shouldn't be too big.

Can anybody help me? Thanks for reading this!
posted by EatingCereal to Computers & Internet (12 answers total) 6 users marked this as a favorite
 
Look on Google for "website downloader", set the url of whichever programme you choose to that of the cached site and Bob's your father's brother.
posted by Biru at 3:52 PM on September 23, 2010


ScrapBook is a nice firefox extension for archiving sites
posted by Triton at 3:55 PM on September 23, 2010


wget.
posted by Civil_Disobedient at 4:54 PM on September 23, 2010


Oh, and to get the url change the search criteria to filter on the site's url ("site: [web site]"). For example, here's the result for this particular web page.
posted by Civil_Disobedient at 4:58 PM on September 23, 2010


Archive.org is also a handy thing way to recover lost sites. That plus one of the downloader options listed above might be what you need.
posted by richyoung at 5:11 PM on September 23, 2010


wget and site downloaders won't work, because Google doesn't change the link URLs in cached pages to point to other cached pages. You'd have to wget the Google results page itself, getting the "Cached" links. Also, automating Google requests may get you temporarily banned; be sure to use the --wait=seconds option.
posted by nicwolff at 5:36 PM on September 23, 2010


There's a program called Warrick which can resurrect websites from the Google cache along with the Internet Archive and Bing. It's not especially easy to use, but it will get your site back eventually.
posted by zsazsa at 6:36 PM on September 23, 2010


httrack....
posted by yoyo_nyc at 5:47 AM on September 24, 2010


Emphasising what nicwolff says, be very very careful when trying to spider the Google cache. Warrick (as suggested by zsazsa) is a nice tool, but you will be banned very quickly indeed by Google unless you are extremely conservative in your choice of options.
posted by Busy Old Fool at 6:25 AM on September 24, 2010


Seconding archive.org's Way Back Machine as a good alternative to Google's cache. While sites seem to drop off of Google's retention, archive.org has copies of sites dating back to the mid 1990's.
posted by samsara at 6:54 AM on September 24, 2010


I second zsazsa's suggestion of warrick, though I should admit I've contributed a little to it.

The sooner you use it, the better. As Google, Yahoo, & Bing update their crawls of the site, their copies will likely fade away.
posted by Pronoiac at 4:50 PM on September 24, 2010


Can you contact the owner of the site and ask them to send you a copy?
posted by PueExMachina at 6:32 PM on September 24, 2010


« Older Starting my own arts non-profit: how nutty?   |   I can't see clearly now Newer »
This thread is closed to new comments.