<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

	<title>Comments on: How to save all cached pages for a particular domain?</title>
	<link>http://ask.metafilter.com/65872/How-to-save-all-cached-pages-for-a-particular-domain/</link>
	<description>Comments on Ask MetaFilter post How to save all cached pages for a particular domain?</description>
	<pubDate>Thu, 28 Jun 2007 23:06:00 -0800</pubDate>
	<lastBuildDate>Thu, 28 Jun 2007 23:06:00 -0800</lastBuildDate>
	<language>en-us</language>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<ttl>60</ttl>

	<item>
		<title>Question: How to save all cached pages for a particular domain?</title>
		<link>http://ask.metafilter.com/65872/How-to-save-all-cached-pages-for-a-particular-domain</link>	
		<description>A web site with lots of useful information recently went offline, but the cached pages are still available from Google. Is there a program (for either OS X or Windows) that will automatically save all of the cached pages on Google associated with a particular domain?</description>
		<guid isPermaLink="false">post:ask.metafilter.com,2007:site.65872</guid>
		<pubDate>Thu, 28 Jun 2007 22:53:10 -0800</pubDate>
		<dc:creator>&#xd8;</dc:creator>
		
			<category>archive</category>
		
			<category>download</category>
		
			<category>cached</category>
		
			<category>pages</category>
		
			<category>Google</category>
		
	</item> <item>
		<title>By: Blazecock Pileon</title>
		<link>http://ask.metafilter.com/65872/How-to-save-all-cached-pages-for-a-particular-domain#989696</link>	
		<description>Have you tried &lt;a href=&quot;http://www.gnu.org/software/wget/&quot;&gt;wget&lt;/a&gt; or &lt;a href=&quot;http://curl.haxx.se/&quot;&gt;curl&lt;/a&gt; with the &lt;a href=&quot;http://www.archive.org/web/web.php&quot;&gt;Wayback Machine&lt;/a&gt;?</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.65872-989696</guid>
		<pubDate>Thu, 28 Jun 2007 23:06:00 -0800</pubDate>
		<dc:creator>Blazecock Pileon</dc:creator>
	</item><item>
		<title>By: phrontist</title>
		<link>http://ask.metafilter.com/65872/How-to-save-all-cached-pages-for-a-particular-domain#989702</link>	
		<description>wget is the way to go.&lt;br&gt;
&lt;br&gt;
&lt;small&gt;I &lt;em&gt;love&lt;/em&gt; your nick&lt;/small&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.65872-989702</guid>
		<pubDate>Thu, 28 Jun 2007 23:11:55 -0800</pubDate>
		<dc:creator>phrontist</dc:creator>
	</item><item>
		<title>By: zippy</title>
		<link>http://ask.metafilter.com/65872/How-to-save-all-cached-pages-for-a-particular-domain#989754</link>	
		<description>Note - the Wayback machine doesn&apos;t show new pages until many months (6 - 12) later, so you may want to check them again several months from  now.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.65872-989754</guid>
		<pubDate>Fri, 29 Jun 2007 01:21:30 -0800</pubDate>
		<dc:creator>zippy</dc:creator>
	</item><item>
		<title>By: blag</title>
		<link>http://ask.metafilter.com/65872/How-to-save-all-cached-pages-for-a-particular-domain#989764</link>	
		<description>&lt;blockquote&gt;&lt;a href=&quot;http://www.cs.odu.edu/~fmccown/research/lazy/warrick.html&quot;&gt;Warrick&lt;/a&gt; is a command-line utility for reconstructing or recovering a website when a back-up is not available. Warrick will search the Internet Archive, Google, MSN, and Yahoo for stored pages and images and will save them to your filesystem. &lt;/blockquote&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.65872-989764</guid>
		<pubDate>Fri, 29 Jun 2007 01:41:34 -0800</pubDate>
		<dc:creator>blag</dc:creator>
	</item><item>
		<title>By: &#xd8;</title>
		<link>http://ask.metafilter.com/65872/How-to-save-all-cached-pages-for-a-particular-domain#989765</link>	
		<description>Unfortunately the Wayback Machine was blocked by the site&apos;s robots.txt. Google has it all cached, though.&lt;br&gt;
&lt;br&gt;
I&apos;ve been reading the documentation for wget (which looks really cool -- thanks for the recommendation!), but can&apos;t figure out how to get it to archive just the cached page links returned by Google. I&apos;m probably overlooking something very obvious... any pointers would be appreciated...</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.65872-989765</guid>
		<pubDate>Fri, 29 Jun 2007 01:45:24 -0800</pubDate>
		<dc:creator>&#xd8;</dc:creator>
	</item><item>
		<title>By: &#xd8;</title>
		<link>http://ask.metafilter.com/65872/How-to-save-all-cached-pages-for-a-particular-domain#989767</link>	
		<description>Wow, blag! Warrick looks perfect!!!! Thank you.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.65872-989767</guid>
		<pubDate>Fri, 29 Jun 2007 01:47:53 -0800</pubDate>
		<dc:creator>&#xd8;</dc:creator>
	</item><item>
		<title>By: Merdryn</title>
		<link>http://ask.metafilter.com/65872/How-to-save-all-cached-pages-for-a-particular-domain#989944</link>	
		<description>I was wondering if there&apos;s another program out there like Warrick, or if it&apos;s the only one (sorry to derail, but really curious about this one)</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.65872-989944</guid>
		<pubDate>Fri, 29 Jun 2007 08:01:51 -0800</pubDate>
		<dc:creator>Merdryn</dc:creator>
	</item><item>
		<title>By: blag</title>
		<link>http://ask.metafilter.com/65872/How-to-save-all-cached-pages-for-a-particular-domain#991884</link>	
		<description>Happy to help.&lt;br&gt;
&lt;br&gt;
Merdryn: as far as I know, it&apos;s the only one.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2007:site.65872-991884</guid>
		<pubDate>Sun, 01 Jul 2007 18:09:18 -0800</pubDate>
		<dc:creator>blag</dc:creator>
	</item>
	</channel>
</rss>
