September 13, 2004
3:37 PM   RSS feed for this thread Subscribe

Can anyone recommend a program (in say perl or java or something) that will execute on my linux web server and a) slurp the file references in the pages (html, php) and b) compare them to the actual files in my web root tree giving me a list of all of the unreferenced files (not referenced on public web pages)? I want to clean up this junky file system, but I don't want to break any links and I inherited the mess. Thanks.
posted by pissfactory to (5 comments total)
for i in `find files` ; do if ( ! grep -q $i *.html ) then echo $i ; fi ; done

...or you could use wget, then find and diff, which would be a little more reliable.
posted by sfenders at 3:59 PM on September 13, 2004


pissfactory: I wrote just such a thing years ago. Code here, and brief (almost non) manual here. No guarantees that it runs, works, does anything productive, or doesn't attract space aliens to your house.
posted by weston at 4:50 PM on September 13, 2004


And is sfenders some kind of shell ninja or what?
posted by weston at 5:29 PM on September 13, 2004


linklint -orphan
posted by nicwolff at 6:17 PM on September 13, 2004


You could also just wget -r -l 0 your website. After that, all files with an access time of before you started wgetting are orphaned and can be moved or deleted. (Use find -amin -10, or ls **/*(.am-10) in the ever wonderful zsh)

sfenders' snippet will break should you have filenames with spaces in them. You can reasonably easily convert this to something* that only breaks with filenames containing newlines, but even that can happen. All in all the only safe file-name separator is a \0, which are harder to work with in shells, sadly.

find . | while read -r i; do if ( ! grep -r $i * ) then echo $i ; fi ; done (this also works if your html files aren't all in the current directory)
posted by fvw at 8:34 AM on September 14, 2004


« Older I'm thinking of joining The We...   |   WindowsXP, 250 gig shared back... Newer »

You are not logged in, either login or create an account to post comments



Related Questions
Java update crashes IE7 July 18, 2008
Flash or Java player to play most internet audio... December 1, 2007
Load testing a java/jscript web app? June 15, 2006
Where can I host my code? December 23, 2005
I have been looking at changing my web development... January 9, 2004