How can I efficiently harvest Google's entire cache of my website?
March 18, 2013 2:35 PM   Subscribe

I stupidly let a web hosting account expire without backing up and transferring my site to my new server. I'm talking to the host to try to get access to my data (fingers crossed), but in case that does not work out: Is there a way to efficiently harvest Google's entire cache of a specific domain? Something else I should consider? I would like to act fast on this, and I don't have any programming/tech-fu skills.
posted by croutonsupafreak to Computers & Internet (10 answers total) 2 users marked this as a favorite
Have you considered using the Internet Wayback Machine?
posted by musofire at 2:36 PM on March 18, 2013

My site is so small/inconsequential that I assumed it would not be archived there, but thanks for the suggestion -- it is! This may save me a ton of effort/worry.
posted by croutonsupafreak at 2:45 PM on March 18, 2013

Update: Hmm, well it's archived on the Wayback Machine through 2009, which is better than nothing, but the last four years are still missing, so I partially retract my enthusiasm and hope others can offer up suggestions.
posted by croutonsupafreak at 2:46 PM on March 18, 2013

Here is a similar question.
posted by enn at 2:53 PM on March 18, 2013

How big is your site?
posted by ryanrs at 4:04 PM on March 18, 2013

Not too big. It's just a personal blog. But I've been maintaining it since October/November of 2000 -- first with hand-coded html updates, which I then transferred to Blogger, then Graymatter, then Moveable Type, then Wordpress, so it's got a long history. It looks as though my kindly previous host backed up all the Wordpress content, at least, which means there are only a small handful of hand-built sub-sites that I need to retrieve from the Google cache gods. So I guess I can relax now. And, you know, never be this stupid again -- at least not in the particular way. I'm sure I'll find some new way to be this stupid in the future.
posted by croutonsupafreak at 4:31 PM on March 18, 2013

Mefi-mail me the details and I can probably help you grab stuff. I'm pretty good at this kind of thing. I'm going to sleep soon, but will check when I wake up.
posted by ryanrs at 4:43 PM on March 18, 2013

In case you didn't know, searching for should return pretty much everything Google knows of on that site. You can even include subdirectories.
posted by advicepig at 5:59 PM on March 18, 2013

Note that trying to batch download things from Google's cache will get your IP address temporarily banned by Google.
posted by vasi at 10:19 PM on March 18, 2013

Thanks everyone, especially ryanrs for the offer to help grab stuff. Fortunately I was able to get a full backup of the site and now it's up and running again. Now I feel sheepish about "wasting" a question, but damn I was scared for a minute there. BACK YOUR STUFF UP, EVERYONE.
posted by croutonsupafreak at 12:21 PM on March 19, 2013

« Older Help me interpret my elimination(ish) diet results   |   Daydreaming about nabbing a Roku 3, but have a few... Newer »
This thread is closed to new comments.