<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

	<title>Comments on: I need a script to compare files in a directory to files referenced in web pages.</title>
	<link>http://ask.metafilter.com/10150/I-need-a-script-to-compare-files-in-a-directory-to-files-referenced-in-web-pages/</link>
	<description>Comments on Ask MetaFilter post I need a script to compare files in a directory to files referenced in web pages.</description>
	<pubDate>Mon, 13 Sep 2004 15:59:16 -0800</pubDate>
	<lastBuildDate>Mon, 13 Sep 2004 15:59:16 -0800</lastBuildDate>
	<language>en-us</language>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<ttl>60</ttl>

	<item>
		<title>Question: I need a script to compare files in a directory to files referenced in web pages.</title>
		<link>http://ask.metafilter.com/10150/I-need-a-script-to-compare-files-in-a-directory-to-files-referenced-in-web-pages</link>	
		<description>Can anyone recommend a program (in say perl or java or something) that will execute on my linux web server and a) slurp the file references in the pages (html, php) and b) compare them to the actual files in my web root tree giving me a list of all of the unreferenced files (not referenced on public web pages)? I want to clean up this junky file system, but I don&apos;t want to break any links and I inherited the mess. Thanks.</description>
		<guid isPermaLink="false">post:ask.metafilter.com,2004:site.10150</guid>
		<pubDate>Mon, 13 Sep 2004 15:37:38 -0800</pubDate>
		<dc:creator>pissfactory</dc:creator>
		
			<category>Software</category>
		
			<category>Perl</category>
		
			<category>Java</category>
		
			<category>Linux</category>
		
			<category>Web</category>
		
			<category>Internet</category>
		
			<category>FileSystem</category>
		
			<category>Hyperlinks</category>
		
	</item> <item>
		<title>By: sfenders</title>
		<link>http://ask.metafilter.com/10150/I-need-a-script-to-compare-files-in-a-directory-to-files-referenced-in-web-pages#185314</link>	
		<description>for i in `find files` ; do if ( ! grep -q $i *.html ) then echo $i ; fi ; done&lt;br&gt;
&lt;br&gt;
...or you could use wget, then find and diff, which would be a little more reliable.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2004:site.10150-185314</guid>
		<pubDate>Mon, 13 Sep 2004 15:59:16 -0800</pubDate>
		<dc:creator>sfenders</dc:creator>
	</item><item>
		<title>By: weston</title>
		<link>http://ask.metafilter.com/10150/I-need-a-script-to-compare-files-in-a-directory-to-files-referenced-in-web-pages#185328</link>	
		<description>pissfactory: I wrote just such a thing years ago. &lt;a href=&quot;http://mmedia.csoft.net/weston/oldweston/webtools/FDW.tar.gz&quot;&gt;Code here&lt;/a&gt;, and brief (almost non) &lt;a href=&quot;http://mmedia.csoft.net/weston/oldweston/webtools/FDWManual.html&quot;&gt;manual here&lt;/a&gt;. No guarantees that it runs, works, does anything productive, or doesn&apos;t attract space aliens to your house.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2004:site.10150-185328</guid>
		<pubDate>Mon, 13 Sep 2004 16:50:35 -0800</pubDate>
		<dc:creator>weston</dc:creator>
	</item><item>
		<title>By: weston</title>
		<link>http://ask.metafilter.com/10150/I-need-a-script-to-compare-files-in-a-directory-to-files-referenced-in-web-pages#185352</link>	
		<description>And is sfenders some kind of shell ninja or what?</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2004:site.10150-185352</guid>
		<pubDate>Mon, 13 Sep 2004 17:29:39 -0800</pubDate>
		<dc:creator>weston</dc:creator>
	</item><item>
		<title>By: nicwolff</title>
		<link>http://ask.metafilter.com/10150/I-need-a-script-to-compare-files-in-a-directory-to-files-referenced-in-web-pages#185369</link>	
		<description>&lt;a href=&quot;http://www.linklint.org/&quot;&gt;linklint&lt;/a&gt; -orphan</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2004:site.10150-185369</guid>
		<pubDate>Mon, 13 Sep 2004 18:17:21 -0800</pubDate>
		<dc:creator>nicwolff</dc:creator>
	</item><item>
		<title>By: fvw</title>
		<link>http://ask.metafilter.com/10150/I-need-a-script-to-compare-files-in-a-directory-to-files-referenced-in-web-pages#185521</link>	
		<description>You could also just &lt;code&gt;wget -r -l 0&lt;/code&gt; your website. After that, all files with an access time of before you started &lt;code&gt;wget&lt;/code&gt;ting are orphaned and can be moved or deleted. (Use &lt;code&gt;find -amin -10&lt;/code&gt;, or &lt;code&gt;ls **/*(.am-10)&lt;/code&gt; in the ever wonderful &lt;a href=&quot;http://www.zsh.org&quot;&gt;zsh&lt;/a&gt;)&lt;br&gt;
&lt;br&gt;
sfenders&apos; snippet will break should you have filenames with spaces in them. You can reasonably easily convert this to something&lt;sup&gt;&lt;a href=&quot;#fvw.something&quot;&gt;*&lt;/a&gt;&lt;/sup&gt; that only breaks with filenames containing newlines, but even that can happen. All in all the only safe file-name separator is a &lt;code&gt;\0&lt;/code&gt;, which are harder to work with in shells, sadly.&lt;br&gt;
&lt;br&gt;
&lt;a name=&quot;fvw.something&quot;&gt;&lt;/a&gt;&lt;small&gt;&lt;code&gt;find . | while read -r i; do if ( ! grep -r $i * ) then echo $i ; fi ; done&lt;/code&gt; (this also works if your html files aren&apos;t all in the current directory)&lt;small&gt;&lt;/small&gt;&lt;/small&gt;</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2004:site.10150-185521</guid>
		<pubDate>Tue, 14 Sep 2004 08:34:42 -0800</pubDate>
		<dc:creator>fvw</dc:creator>
	</item>
	</channel>
</rss>
