How do I download an entire website from Google's cache?
September 23, 2010 3:48 PM Subscribe
How do I download an entire website from Google's cache?
There used to be an amazing website that featured video game reviews, debates, tributes, and experiences. This was everything a video game website should be, but it was the only one of it's kind.
It was ran by one guy, and it was updated from the late 90s to 2005. The site was abandoned, but it was able to stay online until a few weeks ago. The webmaster's ISP finally wiped it off their server.
The site now only exists as a part of Google's cache. I loved this place, it means so much to me, and I'd love to have it permanently archived on my own computer. The entire thing shouldn't be too big.
Can anybody help me? Thanks for reading this!
There used to be an amazing website that featured video game reviews, debates, tributes, and experiences. This was everything a video game website should be, but it was the only one of it's kind.
It was ran by one guy, and it was updated from the late 90s to 2005. The site was abandoned, but it was able to stay online until a few weeks ago. The webmaster's ISP finally wiped it off their server.
The site now only exists as a part of Google's cache. I loved this place, it means so much to me, and I'd love to have it permanently archived on my own computer. The entire thing shouldn't be too big.
Can anybody help me? Thanks for reading this!
ScrapBook is a nice firefox extension for archiving sites
posted by Triton at 3:55 PM on September 23, 2010
posted by Triton at 3:55 PM on September 23, 2010
Oh, and to get the url change the search criteria to filter on the site's url ("site: [web site]"). For example, here's the result for this particular web page.
posted by Civil_Disobedient at 4:58 PM on September 23, 2010
posted by Civil_Disobedient at 4:58 PM on September 23, 2010
Archive.org is also a handy thing way to recover lost sites. That plus one of the downloader options listed above might be what you need.
posted by richyoung at 5:11 PM on September 23, 2010
posted by richyoung at 5:11 PM on September 23, 2010
wget and site downloaders won't work, because Google doesn't change the link URLs in cached pages to point to other cached pages. You'd have to wget the Google results page itself, getting the "Cached" links. Also, automating Google requests may get you temporarily banned; be sure to use the
posted by nicwolff at 5:36 PM on September 23, 2010
--wait=seconds
option.posted by nicwolff at 5:36 PM on September 23, 2010
There's a program called Warrick which can resurrect websites from the Google cache along with the Internet Archive and Bing. It's not especially easy to use, but it will get your site back eventually.
posted by zsazsa at 6:36 PM on September 23, 2010
posted by zsazsa at 6:36 PM on September 23, 2010
Emphasising what nicwolff says, be very very careful when trying to spider the Google cache. Warrick (as suggested by zsazsa) is a nice tool, but you will be banned very quickly indeed by Google unless you are extremely conservative in your choice of options.
posted by Busy Old Fool at 6:25 AM on September 24, 2010
posted by Busy Old Fool at 6:25 AM on September 24, 2010
Seconding archive.org's Way Back Machine as a good alternative to Google's cache. While sites seem to drop off of Google's retention, archive.org has copies of sites dating back to the mid 1990's.
posted by samsara at 6:54 AM on September 24, 2010
posted by samsara at 6:54 AM on September 24, 2010
I second zsazsa's suggestion of warrick, though I should admit I've contributed a little to it.
The sooner you use it, the better. As Google, Yahoo, & Bing update their crawls of the site, their copies will likely fade away.
posted by Pronoiac at 4:50 PM on September 24, 2010
The sooner you use it, the better. As Google, Yahoo, & Bing update their crawls of the site, their copies will likely fade away.
posted by Pronoiac at 4:50 PM on September 24, 2010
Can you contact the owner of the site and ask them to send you a copy?
posted by PueExMachina at 6:32 PM on September 24, 2010
posted by PueExMachina at 6:32 PM on September 24, 2010
This thread is closed to new comments.
posted by Biru at 3:52 PM on September 23, 2010