<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

      <title>Comments on: Is there a programme to allow me to find redundant, non-linked files on my server?</title>
      <link>http://ask.metafilter.com/33669/Is-there-a-programme-to-allow-me-to-find-redundant-nonlinked-files-on-my-server/</link>
      <description>Comments on Ask MetaFilter post Is there a programme to allow me to find redundant, non-linked files on my server?</description>
	  	  <pubDate>Fri, 03 Mar 2006 02:29:18 -0800</pubDate>
      <lastBuildDate>Fri, 03 Mar 2006 02:29:18 -0800</lastBuildDate>
      <language>en-us</language>
	  <docs>http://blogs.law.harvard.edu/tech/rss</docs>
	  <ttl>60</ttl>

<item>
  	<title>Question: Is there a programme to allow me to find redundant, non-linked files on my server?</title>
  	<link>http://ask.metafilter.com/33669/Is-there-a-programme-to-allow-me-to-find-redundant-nonlinked-files-on-my-server</link>	
  	<description>Is there a programme, or way,  to allow me to find &apos;redundant&apos; files on my server: i.e. files that are not linked to from any other page? &lt;br /&gt;&lt;br /&gt; Several of our larger websites have been running for 10+ years now. I&apos;m sure there are hundreds of images and pdfs and video files and pages and entire directories that are hanging around on the server that are no longer used. Is there a programme that can spider a website, then spider the site directory on the server and then provide a list of files that are on the latter but not the former so that I can do a bit of housekeeping?</description>
  	<guid isPermaLink="false">post:ask.metafilter.com,2008:site.33669</guid>
  	<pubDate>Fri, 03 Mar 2006 01:57:24 -0800</pubDate>
  	<dc:creator>Hartster</dc:creator>
	
	<category>internet</category>
	
	<category>websites</category>
	
	<category>search</category>
	
</item>
<item>
  	<title>By: AmbroseChapel</title>
  	<link>http://ask.metafilter.com/33669/Is-there-a-programme-to-allow-me-to-find-redundant-nonlinked-files-on-my-server#524729</link>	
  	<description>You can do it with Dreamweaver, it appears. But you need to have your website defined as a Site in Dreamweaver. You&apos;d then use &amp;quot;Check Links&amp;quot; under the Sites menu.&lt;br&gt;
&lt;br&gt;
Otherwise there are a ton of site management programs out there which should have this as a feature, and I think the missing piece of the puzzle is that you&apos;re not googling for the phrase &amp;quot;orphaned files&amp;quot;</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.33669-524729</guid>
  	<pubDate>Fri, 03 Mar 2006 02:29:18 -0800</pubDate>
  	<dc:creator>AmbroseChapel</dc:creator>
</item>
<item>
  	<title>By: twine42</title>
  	<link>http://ask.metafilter.com/33669/Is-there-a-programme-to-allow-me-to-find-redundant-nonlinked-files-on-my-server#524747</link>	
  	<description>Can I suggest being &lt;i&gt;very&lt;/i&gt; wary of anything that offers the ability to do this?&lt;br&gt;
&lt;br&gt;
You need something that can not only trawl through the HTML, but which can also parse any javascript, cgi and php/asp/etc that may be on the servers as you can rarely be sure that all links are solely HTML.&lt;br&gt;
&lt;br&gt;
Dreamweaver&apos;s abilities are severely limited in this respect.&lt;br&gt;
&lt;br&gt;
You also need to be sure that no-one is crosslinking information from another site - internally of externally. Your tech guys may be able to give you a list of files that haven&apos;t been &apos;touched&apos; (pretty sure that&apos;s the Unix term). This will give you a list of files that haven&apos;t been accessed. This doesn&apos;t tell you that it&apos;s not linked too, but if one of the files you want to delete isn&apos;t on this list, you need to find out why before you delete it.</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.33669-524747</guid>
  	<pubDate>Fri, 03 Mar 2006 03:50:57 -0800</pubDate>
  	<dc:creator>twine42</dc:creator>
</item>
<item>
  	<title>By: twine42</title>
  	<link>http://ask.metafilter.com/33669/Is-there-a-programme-to-allow-me-to-find-redundant-nonlinked-files-on-my-server#524748</link>	
  	<description>Damn...&lt;br&gt;
&lt;br&gt;
&lt;i&gt;[...] files that haven&apos;t been &apos;touched&apos; &lt;b&gt;in the last x months&lt;/b&gt; (pretty sure [...]&lt;/i&gt;</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.33669-524748</guid>
  	<pubDate>Fri, 03 Mar 2006 03:52:27 -0800</pubDate>
  	<dc:creator>twine42</dc:creator>
</item>
<item>
  	<title>By: fvw</title>
  	<link>http://ask.metafilter.com/33669/Is-there-a-programme-to-allow-me-to-find-redundant-nonlinked-files-on-my-server#524752</link>	
  	<description>Actually, touched isn&apos;t what you&apos;re looking for. Touching sets the modification time, what you want is access time. If your filesystem stores access times (chances are it can), and it hasn&apos;t been disabled as a mount option (this is often done to enhance performance), and your web directories aren&apos;t mounted read-only, try &lt;code&gt;find /path/to/web/directory -atime +61&lt;/code&gt;, which should list all files that haven&apos;t been read from in the last 61 days. You might want to spider your site first to make sure you don&apos;t hit files that nobody happens to have requested in the last two months, and of course you should move them out of the directory tree into a backup instead of just plain deleting& This all assumes you&apos;re using a unix, but if this is about the site mentioned in your profile, you are.</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.33669-524752</guid>
  	<pubDate>Fri, 03 Mar 2006 04:05:09 -0800</pubDate>
  	<dc:creator>fvw</dc:creator>
</item>
<item>
  	<title>By: sfenders</title>
  	<link>http://ask.metafilter.com/33669/Is-there-a-programme-to-allow-me-to-find-redundant-nonlinked-files-on-my-server#524813</link>	
  	<description>wget -m http://mirror.com/&lt;br&gt;
cd html_root&lt;br&gt;
find . | sort &amp;gt; ../root.files&lt;br&gt;
cd mirror.com&lt;br&gt;
find . | sort &amp;gt; ../mirror.files&lt;br&gt;
cd ..&lt;br&gt;
diff root.files mirror.files&lt;br&gt;
&lt;br&gt;
But using atime is better, yeah.</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.33669-524813</guid>
  	<pubDate>Fri, 03 Mar 2006 06:15:15 -0800</pubDate>
  	<dc:creator>sfenders</dc:creator>
</item>
<item>
  	<title>By: yeahyeahyeahwhoo</title>
  	<link>http://ask.metafilter.com/33669/Is-there-a-programme-to-allow-me-to-find-redundant-nonlinked-files-on-my-server#524816</link>	
  	<description>nice fvw!</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.33669-524816</guid>
  	<pubDate>Fri, 03 Mar 2006 06:17:51 -0800</pubDate>
  	<dc:creator>yeahyeahyeahwhoo</dc:creator>
</item>
<item>
  	<title>By: Mitheral</title>
  	<link>http://ask.metafilter.com/33669/Is-there-a-programme-to-allow-me-to-find-redundant-nonlinked-files-on-my-server#524850</link>	
  	<description>61 days is way to short, lots of activity is annual in nature.  You don&apos;t want to blow away tips on surviving Burning Man or Halloween costume suggestions etc.</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.33669-524850</guid>
  	<pubDate>Fri, 03 Mar 2006 07:02:05 -0800</pubDate>
  	<dc:creator>Mitheral</dc:creator>
</item>
<item>
  	<title>By: Hartster</title>
  	<link>http://ask.metafilter.com/33669/Is-there-a-programme-to-allow-me-to-find-redundant-nonlinked-files-on-my-server#524895</link>	
  	<description>Perfect, thanks fvw. Works like a treat. The good thing about this method is (re: Mitheral&apos;s worries) that the searchbots ever more aggressive spidering will have accessed even the most unpopular pages, so I can be pretty sure what&apos;s produced by fvw&apos;s method is genuinely orphaned. &lt;br&gt;
&lt;br&gt;
Thanks again AskMe!</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.33669-524895</guid>
  	<pubDate>Fri, 03 Mar 2006 08:15:45 -0800</pubDate>
  	<dc:creator>Hartster</dc:creator>
</item>
<item>
  	<title>By: yerfatma</title>
  	<link>http://ask.metafilter.com/33669/Is-there-a-programme-to-allow-me-to-find-redundant-nonlinked-files-on-my-server#525271</link>	
  	<description>Linkbot claimed to do this, but in practice it never seemed to be able to tie together spidering a site and FTPing into the root folder. That was years ago, so they may have improved it since then.</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.33669-525271</guid>
  	<pubDate>Fri, 03 Mar 2006 13:43:01 -0800</pubDate>
  	<dc:creator>yerfatma</dc:creator>
</item>
<item>
  	<title>By: AmbroseChapel</title>
  	<link>http://ask.metafilter.com/33669/Is-there-a-programme-to-allow-me-to-find-redundant-nonlinked-files-on-my-server#526732</link>	
  	<description>I feel like saying this exercise is not worth the trouble. If the pages which are &amp;quot;orphaned&amp;quot; aren&apos;t being accessed, they&apos;re costing you nothing in bandwidth and effectively nothing in storage. &lt;br&gt;
&lt;br&gt;
And if somebody&apos;s got a bookmark or email link to them, or if some search engine has a record of them which might turn up, despite the fact that they&apos;re orphaned on your server, they&apos;re not really orphaned on the internet as a whole, and you risk a 404 for no good reason.</description>
  	<guid isPermaLink="false">comment:ask.metafilter.com,2008:site.33669-526732</guid>
  	<pubDate>Sun, 05 Mar 2006 17:00:37 -0800</pubDate>
  	<dc:creator>AmbroseChapel</dc:creator>
</item>

    </channel>
</rss>
