How to recursive copy only files *.foo w/directory structure intact?
November 10, 2008 10:20 AM   Subscribe

From the (linux) command line, is there a technique to recursively copy all files of type X from ./foo to ./bar and reproduce the directory structure of ./foo within ./bar? Example: I want to copy every html file but no others from every directory in one location to mirrored directories in another.

If it makes things simpler, you can assume ./foo and ./bar contain identical directories (if there is ./foo/foo2 assume there is ./bar/foo2), although ideally this wouldn't be required.
posted by Grod to Computers & Internet (19 answers total) 3 users marked this as a favorite
 
cd foo; find . -name \*.html -exec echo mv {} ../bar/{} \;
posted by sergent at 10:27 AM on November 10, 2008


If you want the full-on directory creation stuff, how about:

find foo -name \*.html | while read f; do d="bar/${f#foo/}"; mkdir -p "$(dirname "$d")"; mv "$f" "$d"; done

but it'll be slower.
posted by sergent at 10:30 AM on November 10, 2008


rsync.

I believe the command you'd want would be:

rsync -av --include='*.html' /path/to/foo/. /path/to/bar/.

This will copy all files ending in .html, preserving file ownership/permissions (if you run it as root/sudo), and directory structure.

Furthermore, you can use it regularly to ensure directories are in sync. "man rsync" for a full explanation of all the powers of rsync.
posted by swngnmonk at 10:31 AM on November 10, 2008 [1 favorite]


Response by poster: sergent, the first one outputted a bunch to the screen but does not seem to have done anything (possibly because I used "cp" not "mv" ?) I haven't tried the second.

swngnmonk, yours duplicated everything (including .html files) not just .html files.

Almost, but not quite what I'm trying for, sorry.
posted by Grod at 10:36 AM on November 10, 2008


swngnmonk, that's way too easy. You need some backslashes and dollar signs!
posted by sergent at 10:36 AM on November 10, 2008 [1 favorite]


Remove 'echo' from the first command I posted, that was just there for testing, oops.
posted by sergent at 10:37 AM on November 10, 2008


I think the correct rsync usage would be:

rsync -av --include "*/" --include "*.html" --exclude "*" /path/to/foo/. /path/to/bar/
posted by zamboni at 11:03 AM on November 10, 2008 [1 favorite]


I'd just do this with a pair of find commands. Something like this (untested) should work:

cd foo
find . -type d -exec mkdir ../bar/{} \;
find . -type f -iname '*.html' -exec cp ../bar/{} \;
posted by sbutler at 11:04 AM on November 10, 2008


my mistake - I'm used to using --exclude, not --include. What happened makes sense.

Try this (untested) - it should exclude everything, and then include only directories & files ending in .html.

rsync -av --include '*.html' --include '*/' --exclude '*' /path/foo/. /path/bar/.
posted by swngnmonk at 11:06 AM on November 10, 2008


sergent:

I like simple and easy. Especially when trying to keep multi-terabyte datasets in sync. ;)
posted by swngnmonk at 11:07 AM on November 10, 2008


You can also use tar with -X to exclude some file pattern.
posted by jeffburdges at 11:56 AM on November 10, 2008


cd foo ; find . -type f -iname '*html' -print0 | xargs -0 tar cf - | tar xvf - -C ../bar
posted by scruss at 12:19 PM on November 10, 2008


assume directories foo and bar at the same level:

$ cd foo
$ find . -name '*.html' |cpio -pdv ../bar

or to retain file times

$ find . -name '*.html' |cpio -pdvm ../bar
posted by jockc at 3:39 PM on November 10, 2008 [1 favorite]


mkdir bar
cd foo
tar -c `find . -name \*html` | tar -x -C ../bar
posted by sfenders at 6:16 PM on November 10, 2008


oops... bad form, though it works with gnu tar. clearly I meant:

tar c `find . -name \*html` | tar x -C ../bar
posted by sfenders at 6:19 PM on November 10, 2008


or not, now I can't remember.
I should not be operating heavy machinery like tar.
they both work on linux anyway.
posted by sfenders at 6:20 PM on November 10, 2008


jockc has it right, cpio is the best way to do this. I had completely forgotten about its existence.
posted by sergent at 9:55 PM on November 10, 2008


The powershell perspective:

get-childitem c:\SourceDirectory -Recurse -Include *.foo | copy-item -destination c:\TargetDirectory
posted by MrHappyGoLucky at 7:55 AM on November 11, 2008


Response by poster: These are all fascinating, I have so much to learn. I was in a rush, though, so I ended up overwriting every file rather than experiment with each on copies of the directories. Now that I have time I'll read all the man pages and try out each solution.

Thanks!
posted by Grod at 7:25 PM on November 11, 2008


« Older I know Indiana isn't Mars, but it might as well be...   |   How much is my Macbook worth? Newer »
This thread is closed to new comments.