Find a file that isn't there?
July 16, 2006 7:40 AM   Subscribe

I need a (Linux) shell command. Ready?

Let's say you have 100 directories, each one containing 3 (sub)directories... for a total of 300 directories. Simple enough. Each directory is supposed to contain, among other things, an index.html file. But three of those directories DO NOT contain said file. The question is, which directories are missing this file?

I've slapped together so many combinations of 'find' and 'grep' and 'awk' and 'sort'... well, it's making my head spin.
posted by Witty to Computers & Internet (17 answers total) 3 users marked this as a favorite
 

for i in $(find . -type d); do if [ ! -f $i/index.html ]; then echo "$i is missing an index file"; fi; done

posted by hob at 7:47 AM on July 16, 2006


perl -e 'foreach (split(/\n/,`find * -type d`)) { if (!-e "$_\/index.html") { print $_ . "\n"; }}'
posted by petethered at 7:50 AM on July 16, 2006


Funny thing was I just did this , accept instead of just listing the directories , I had the one liner copy index.cgi files into the directories for me.
posted by petethered at 7:52 AM on July 16, 2006


Thanks... pretty neat.

hob - your command works, except for (and I didn't make this clear in the original post) the fact that some of the directory name have spaces in them. For example:

/home/public_html/photos/Summer 2006/

So your output spits out things like:

./Summer is missing an index file
2006 is missing an index file

petethered - Yours is working better for me, but can we adjust the code to look for *.html instead of specifically index.html (as I just used index.html as an example for my question, when in reality, the .html files are a variety of names. I tried using a wildcard in your command, but I obviously don't know exactly how to do it because the results are unchanged).
posted by Witty at 8:22 AM on July 16, 2006


Preceed the command with


IFS='\n';


and put double quotes (") around the $i's. So...


IFS="\n"; for i in $(find . -type d); do if [ ! -f "$i"/index.html ]; then echo "$i is missing an index file"; fi; done

posted by hob at 8:44 AM on July 16, 2006


for i in *; do if [ -d "$i" ] ; then ls "$i"/*.html 1>/dev/null; fi; done
posted by sfenders at 8:47 AM on July 16, 2006


oops, sorry... missed the level second level of subdirectories, so:

for i in */*; do if [ -d "$i" ] ; then ls "$i"/*.html 1>/dev/null; fi; done
posted by sfenders at 8:50 AM on July 16, 2006


Gettin' closer...

But the letter 'n' in any directory name is cauing problems now. For example:

/home/public_html/photos/Summer 2006/Funny Fingers/

./Fu is missing an index file
y Fi is missing an index file
gers is missing an index file

Sorry.
posted by Witty at 8:55 AM on July 16, 2006


Wierd... try

$IFS='
> '; for i in $(find . -type d); do if [ ! -f "$i"/index.html ]; then echo "$i is missing an index file"; fi; done

(yes, that's a hard return between the 's after IFS).
posted by hob at 9:00 AM on July 16, 2006


That last response was for hob.

sfenders - This is what your command outputs:

ls: photos/Summer 2006/*.html: No such file or directory
ls: photos/Summer 2006/Funny Fingers/*.html: No such file or directory

...for everything.
posted by Witty at 9:00 AM on July 16, 2006


hob... thanks. That took care of the letter 'n' problem, but now we're back to having each word of the directory name being treated separately.

./Summer is missing an index file
2006 is missing an index file

That wouldn't be the end of the world, of course, but not ideal. Hopefully, this is kind of fun for you. :P
posted by Witty at 9:06 AM on July 16, 2006


The $ and > are indications of prompts, do not copy & paste. Especially the $IFS= should be run as IFS=. Here is a copy/pastable:

IFS='
'; for i in $(find . -type d); do if [ ! -f "$i"/index.html ]; then echo "$i is missing an index file"; fi; done
posted by hob at 9:13 AM on July 16, 2006


BINGO!

You win!

Thanks!
posted by Witty at 9:28 AM on July 16, 2006


/bows

We do more shell support before church than most people do all week... :) If I'd actually run the stupid command before I posted it I probably would have saved a couple of steps.
posted by hob at 9:31 AM on July 16, 2006


Came in too late as usual, but I usually try to avoid using $(find ...), especially in scripts, because the lengthy output that find often produces can exceed the capacity of a shell variable in some shells.

Instead, I like to pipe the output of find through a read loop, like this:

find . -type d | while read -r; do test -e "$REPLY/index.html" || echo "$REPLY"; done

This construction also copes with filenames containing spaces because all parameter expansions are contained within double-quotes, and it it copes with filenames containing backslashes because read has the raw (-r) option.

If you're dealing with filenames that might have embedded newlines, you need to go one step further:

find . -type d -print0 | while read -rd $'\0'; do test -e "$REPLY/index.html" || echo "$REPLY"; done

The -print0 makes find emit the filenames it's finding as null-terminated strings rather than newline-terminated ones, and the -d $'\0' option makes read expect null characters as line delimiters instead of newlines. Substitute whatever processing you need to do on each directory for the echo "$REPLY".

Incidentally, that $'string' construction is the oft-forgotten shell-ese for "interpret special backslash codes within string". So instead of

IFS='
'


you could have used

IFS=$'\n'
posted by flabdablet at 7:04 PM on July 16, 2006


Another way to deal with embedded whatsits (besides messing with IFS) is to do this:

find whatever -print | while read filename; do whatever $filename; done

(replace 'whatever' as appropriate). This still falls apart if a filename has a newline in it, though. For that, a perl solution is easier. There's also the nonstandard but pretty much universally available -print0 primary for find, which goes along with the -0 flag for xargs (those are both zeroes, not letter Os). Filenames are guaranteed not to have NULs or slashes in them, but any other octet is technically fair game.

Actually, you might be able to set IFS to a NUL character and use -print0. Probably depends on your shell.
posted by hattifattener at 8:28 PM on July 16, 2006


If what you actually want to do is identify which directories don't contain any .html files, use this:

find . -type d | while read -r; do files=("$REPLY"/*.html); test -e "${files[0]}" || echo "$REPLY"; done

"$REPLY" expands to the name of the directory last read by read -r, complete with any embedded special characters; "$REPLY"/*.html expands to a list of the pathnames of all the .html files inside that directory if any exist, or to a single string consisting of the directory name with the literal /*.html appended if none do; files=("$REPLY"/*.html) makes the shell variable files into an array that holds the entire list; test -e "${files[0]}" tests whether the first item in that array identifies an existing file; and echo "$REPLY" will be executed only if that test fails.

Bash can do a bunch of amazing things, generally with fewer completely bizarre contortions than would be required by the Windows command line interpreter. I sometimes wonder, though, whether the time spent searching manuals to find out how to make either interpreter do what I want might not be better spent in pointing and clicking :-)
posted by flabdablet at 10:36 PM on July 16, 2006


« Older Help me recycle my old electronics   |   Honda Civic Si Suspension - too bumpy? Newer »
This thread is closed to new comments.