Need a shell script to filter out directories
August 10, 2011 7:52 AM
Shell gurus needed! I need a line/script that will create a list of all sub-folders within a folder that do not contain files with a certain filename pattern. There is an additional requirement...(more inside)
Given a folder /somefolder , I need to be able to create a list of all subfolders within /somefolder that:
1) have files of the pattern *.tif
AND
2) do not have files of the pattern *_Aug11.pdf
In other words a folder containing 8107.tif AND 8107_Aug11.pdf would not match the search expression.
However, a folder containing 8107.tif AND 8107.pdf would match, as would a folder containing just 8107.tif alone, or 8107.tif, 8108.tif, etc. provided there are NO associated _Aug11.pdf for each of those.
The script can be a Windows powershell or unix shell script...
Given a folder /somefolder , I need to be able to create a list of all subfolders within /somefolder that:
1) have files of the pattern *.tif
AND
2) do not have files of the pattern *_Aug11.pdf
In other words a folder containing 8107.tif AND 8107_Aug11.pdf would not match the search expression.
However, a folder containing 8107.tif AND 8107.pdf would match, as would a folder containing just 8107.tif alone, or 8107.tif, 8108.tif, etc. provided there are NO associated _Aug11.pdf for each of those.
The script can be a Windows powershell or unix shell script...
What about a directory containing 8107.tif and 666_Aug11.pdf? Items (1) and (2) seem to say that such a directory is not wanted, but the mention of "associated files" later on suggests you might have something else in mind.
If I read grouse's script right, it prints all directories which contain *.tif files with no associated *_Aug11.pdf files, even if other tifs in the same directory do have associated files, and even if there are unaffiliated *_Aug11.pdf files in the same directory. I don't think that's what you asked for, but maybe it is what you wanted.
Here's my (untested) take on a literal reading of (1) and (2):
posted by stebulus at 8:43 AM on August 10, 2011
If I read grouse's script right, it prints all directories which contain *.tif files with no associated *_Aug11.pdf files, even if other tifs in the same directory do have associated files, and even if there are unaffiliated *_Aug11.pdf files in the same directory. I don't think that's what you asked for, but maybe it is what you wanted.
Here's my (untested) take on a literal reading of (1) and (2):
find somefolder -type d |(while read d; do if ls "$d"/*.tif >/dev/null 2>&1 && ! ls "$d"/*_Aug11.pdf >/dev/null 2>&1; then echo "$d" fi done)
posted by stebulus at 8:43 AM on August 10, 2011
It will also fail for directories with spaces in them. Here's a better version that will work in that case:
posted by grouse at 8:46 AM on August 10, 2011
#!/usr/bin/env bash DIR="${1:-.}" (find "$DIR" -name '*.tif' -exec bash -c \ 'FILENAME="{}" if [ ! -e "${FILENAME%.tif}_Aug11.pdf" ]; then echo "$(dirname "$FILENAME")" fi' ';') | uniq
posted by grouse at 8:46 AM on August 10, 2011
stebulus, the key is in the example case:
However, a folder containing 8107.tif AND 8107.pdf would match, as would a folder containing just 8107.tif alone, or 8107.tif, 8108.tif, etc. provided there are NO associated _Aug11.pdf for each of those.
posted by grouse at 8:48 AM on August 10, 2011
However, a folder containing 8107.tif AND 8107.pdf would match, as would a folder containing just 8107.tif alone, or 8107.tif, 8108.tif, etc. provided there are NO associated _Aug11.pdf for each of those.
posted by grouse at 8:48 AM on August 10, 2011
Yes, I want the folders that contain .tif files but do not have an associated _Aug11.pdf . So, even if there is a folder /somefolder/farm containing:
piggies.tif
8017.tif
8017_Aug11.pdf
then I want that folder "farm" to be listed.
(What I'm trying to do is find all of the .tif files that were not converted to a PDF because of an error within the .tif file).
posted by dukes909 at 8:50 AM on August 10, 2011
piggies.tif
8017.tif
8017_Aug11.pdf
then I want that folder "farm" to be listed.
(What I'm trying to do is find all of the .tif files that were not converted to a PDF because of an error within the .tif file).
posted by dukes909 at 8:50 AM on August 10, 2011
One liner, and messy
find . -name "*tif" | while read tif; do pdf=`echo $tif | sed -e 's/.tif$/_Aug11.pdf/'`; if [ ! -f $pdf ]; then echo File $tif does not have a matching pdf; fi; done
posted by devbrain at 9:14 AM on August 10, 2011
find . -name "*tif" | while read tif; do pdf=`echo $tif | sed -e 's/.tif$/_Aug11.pdf/'`; if [ ! -f $pdf ]; then echo File $tif does not have a matching pdf; fi; done
posted by devbrain at 9:14 AM on August 10, 2011
grouse' script worked great, although it printed them all on one line. That's ok, I redirected it to a file and edited the list. Thank you!
posted by dukes909 at 9:14 AM on August 10, 2011
posted by dukes909 at 9:14 AM on August 10, 2011
Still messy, updated to print just directory names, not the individual files.
find . -name "*tif" | while read tif; do pdf=`echo $tif | sed -e 's/.tif$/_Aug11.pdf/'`; if [ ! -f $pdf ]; then echo Directory `dirname $tif` is missing pdfs; fi; done | sort | uniq
posted by devbrain at 9:15 AM on August 10, 2011
find . -name "*tif" | while read tif; do pdf=`echo $tif | sed -e 's/.tif$/_Aug11.pdf/'`; if [ ! -f $pdf ]; then echo Directory `dirname $tif` is missing pdfs; fi; done | sort | uniq
posted by devbrain at 9:15 AM on August 10, 2011
devbrain - I get a "bash: too many arguments" when I try yours.
posted by dukes909 at 9:19 AM on August 10, 2011
posted by dukes909 at 9:19 AM on August 10, 2011
grouse' script worked great, although it printed them all on one line.
How odd. I even tested it (on Cygwin) and it worked fine for me. Glad it worked for you in some sort, though.
posted by grouse at 9:19 AM on August 10, 2011
How odd. I even tested it (on Cygwin) and it worked fine for me. Glad it worked for you in some sort, though.
posted by grouse at 9:19 AM on August 10, 2011
What I'm trying to do is find all of the .tif files that were not converted to a PDF because of an error within the .tif file
Ok. Then disregard my script; grouse's will serve you nicely.
posted by stebulus at 9:20 AM on August 10, 2011
Ok. Then disregard my script; grouse's will serve you nicely.
posted by stebulus at 9:20 AM on August 10, 2011
That's odd -- I cut/pasted it back out of the and confirmed nothing got broken by markup/reformatting.
If you can bust it apart to different lines it'll narrow down where the problem is. That said, if you've already got a solution that works, there's no need to diagnose this alternative. (This script also won't work with spaces in the filename, but should generate a different error in that instance)
posted by devbrain at 10:24 AM on August 10, 2011
If you can bust it apart to different lines it'll narrow down where the problem is. That said, if you've already got a solution that works, there's no need to diagnose this alternative. (This script also won't work with spaces in the filename, but should generate a different error in that instance)
#!/bin/bash
find . -name "*tif" | \
while read tif; do
pdf=`echo $tif | sed -e 's/.tif$/_Aug11.pdf/'`
if [ ! -f $pdf ]; then
echo Directory `dirname $tif` is missing pdfs
fi
done | sort | uniq
posted by devbrain at 10:24 AM on August 10, 2011
This thread is closed to new comments.
posted by grouse at 8:12 AM on August 10, 2011