I need a script to compare files in a directory to files referenced in web pages.
September 13, 2004 3:37 PM
Can anyone recommend a program (in say perl or java or something) that will execute on my linux web server and a) slurp the file references in the pages (html, php) and b) compare them to the actual files in my web root tree giving me a list of all of the unreferenced files (not referenced on public web pages)? I want to clean up this junky file system, but I don't want to break any links and I inherited the mess. Thanks.
pissfactory: I wrote just such a thing years ago. Code here, and brief (almost non) manual here. No guarantees that it runs, works, does anything productive, or doesn't attract space aliens to your house.
posted by weston at 4:50 PM on September 13, 2004
posted by weston at 4:50 PM on September 13, 2004
You could also just
sfenders' snippet will break should you have filenames with spaces in them. You can reasonably easily convert this to something* that only breaks with filenames containing newlines, but even that can happen. All in all the only safe file-name separator is a
posted by fvw at 8:34 AM on September 14, 2004
wget -r -l 0
your website. After that, all files with an access time of before you started wget
ting are orphaned and can be moved or deleted. (Use find -amin -10
, or ls **/*(.am-10)
in the ever wonderful zsh)sfenders' snippet will break should you have filenames with spaces in them. You can reasonably easily convert this to something* that only breaks with filenames containing newlines, but even that can happen. All in all the only safe file-name separator is a
\0
, which are harder to work with in shells, sadly.find . | while read -r i; do if ( ! grep -r $i * ) then echo $i ; fi ; done
(this also works if your html files aren't all in the current directory)posted by fvw at 8:34 AM on September 14, 2004
This thread is closed to new comments.
...or you could use wget, then find and diff, which would be a little more reliable.
posted by sfenders at 3:59 PM on September 13, 2004