How can I efficiently harvest Google's entire cache of my website?
March 18, 2013 2:35 PM   Subscribe

I stupidly let a web hosting account expire without backing up and transferring my site to my new server. I'm talking to the host to try to get access to my data (fingers crossed), but in case that does not work out: Is there a way to efficiently harvest Google's entire cache of a specific domain? Something else I should consider? I would like to act fast on this, and I don't have any programming/tech-fu skills.
posted by croutonsupafreak to Computers & Internet (10 answers total) 2 users marked this as a favorite
 
Have you considered using the Internet Wayback Machine?
posted by musofire at 2:36 PM on March 18, 2013


My site is so small/inconsequential that I assumed it would not be archived there, but thanks for the suggestion -- it is! This may save me a ton of effort/worry.
posted by croutonsupafreak at 2:45 PM on March 18, 2013


Update: Hmm, well it's archived on the Wayback Machine through 2009, which is better than nothing, but the last four years are still missing, so I partially retract my enthusiasm and hope others can offer up suggestions.
posted by croutonsupafreak at 2:46 PM on March 18, 2013


Here is a similar question.
posted by enn at 2:53 PM on March 18, 2013


How big is your site?
posted by ryanrs at 4:04 PM on March 18, 2013


Not too big. It's just a personal blog. But I've been maintaining it since October/November of 2000 -- first with hand-coded html updates, which I then transferred to Blogger, then Graymatter, then Moveable Type, then Wordpress, so it's got a long history. It looks as though my kindly previous host backed up all the Wordpress content, at least, which means there are only a small handful of hand-built sub-sites that I need to retrieve from the Google cache gods. So I guess I can relax now. And, you know, never be this stupid again -- at least not in the particular way. I'm sure I'll find some new way to be this stupid in the future.
posted by croutonsupafreak at 4:31 PM on March 18, 2013


Mefi-mail me the details and I can probably help you grab stuff. I'm pretty good at this kind of thing. I'm going to sleep soon, but will check when I wake up.
posted by ryanrs at 4:43 PM on March 18, 2013


In case you didn't know, searching for site:myawesomesite.com should return pretty much everything Google knows of on that site. You can even include subdirectories.
posted by advicepig at 5:59 PM on March 18, 2013


Note that trying to batch download things from Google's cache will get your IP address temporarily banned by Google.
posted by vasi at 10:19 PM on March 18, 2013


Thanks everyone, especially ryanrs for the offer to help grab stuff. Fortunately I was able to get a full backup of the site and now it's up and running again. Now I feel sheepish about "wasting" a question, but damn I was scared for a minute there. BACK YOUR STUFF UP, EVERYONE.
posted by croutonsupafreak at 12:21 PM on March 19, 2013


« Older Help me interpret my elimination(ish) diet results   |   Daydreaming about nabbing a Roku 3, but have a few... Newer »
This thread is closed to new comments.