Join 3,556 readers in helping fund MetaFilter (Hide)


Find a list of files on OSX
August 9, 2012 7:48 PM   Subscribe

OSX: script to find all files matching a list of filenames and report their location?

Say I have a text file that lists a bunch of filenames, one per line. I'd like to find any instance of any of these files on the system and write out a summary report of the locations of the matches. This is to be done on OSX, various versions.
posted by odinsdream to Computers & Internet (17 answers total)
 
It would probably need some fine-tuning but something like

find / | grep -f myfiles.txt

would basically work. It might return some matches you don't expect. Like if "myfiles.txt" included a line like
foo.bar
and you had a directory like
/a/b/foo.bar/
then it would match all the files in that directory.

You'll need a passive familiarity with how grep works so that you can make "myfiles.txt" but it's not real complicated.
posted by RustyBrooks at 7:53 PM on August 9, 2012


Leveraging the spotlight database from the command line...
cat filenames.txt | while read -r FILE; do mdfind -name "$FILE"; done

posted by sbutler at 9:22 PM on August 9, 2012 [3 favorites]


This is a lot more typing but maybe is a little more efficient. Here is a python script I wrote that I named "findlist.py"

import sys
files = open( sys.argv[1],"r").readlines()
fstring = "find / \\( -type f -name " + files[0][:-1]
for i in files[1:]:
  fstring = fstring + " -o -type f -name " + i[:-1]
fstring = fstring + " \\) 2>/dev/null"
import commands
output = commands.getoutput(fstring)
print output

Which you can run with your list of files like this :

python findlist.py myfiles.txt

It will dump regular files (it solves the foo.bar problem described above) that are listed in your file of files....

Note this does zero error checking assumes you always pass it a file with at least one file and one file per line. Also, it squelches all the errors that find may throw.
posted by ill3 at 9:26 PM on August 9, 2012


@ill3: My python is rusty, but your example looks like it also chokes on filenames that contain a space. You'd be much, much better off building your command as an array and then using subprocess.check_output()
posted by sbutler at 9:36 PM on August 9, 2012


sbutler : Good catch...However, the "various versions of OSX" scares me a little as I'm not sure when they included ver 2.4 of Python in OSX, which is when subprocess was added. Regardless, this will fix the space issue :

import sys
files = open( sys.argv[1],"r").readlines()
fstring = "find / \\( -type f -name " + files[0][:-1]
for i in files[1:]:
fstring = fstring + " -o -type f -name \"" + i[:-1] + "\""
fstring = fstring + " \\) 2>/dev/null"
import commands
output = commands.getoutput(fstring)
print output

Though, I think subprocess would be a better experience as my script waits until it has all the results back from find before reporting anything at all. But again, I don't know how far back he wants to go with OSX....
posted by ill3 at 9:48 PM on August 9, 2012


Of course the line after the for statement needs to be indented above, like this :

for i in files[1:]:
   fstring = fstring + " -o -type f -name \"" + i[:-1] + "\""
posted by ill3 at 9:49 PM on August 9, 2012


And of course that latest version will choke on filenames with " - which is another argument for using one of the execs, commands, whatevers that take an array. But I'm feeling too lazy unless OP is really dying for version of this that handles difficult names.
posted by ill3 at 9:53 PM on August 9, 2012


Last comment (I promise), sbutler, your Spotlight version is definitely a nice way to go provided OP only cares about searching the user space part of the system. Spotlight won't index system files, etc. Depending on his needs this could be a benefit or a problem.
posted by ill3 at 9:56 PM on August 9, 2012


The metadata database does index the entire system (unless it's been turned off for a drive). The graphical spotlight just suppresses some results.
posted by sbutler at 9:59 PM on August 9, 2012


cat filenames.txt | while read -r FILE; do locate "$FILE"; done > filereport.txt

Once you do that you'll have a report. Note the "locate" command requires building that database. Run it the first time and it should prompt you to create it. Once you do it's pretty fast.
posted by artlung at 10:02 PM on August 9, 2012


sbutler: It appears you're right again :) Strange, I'm used to using Spotlight from the GUI and noticed it doesn't find anything outside of user space really, say try finding "passwd". But it does show up with mdfind. Do you know how to make the GUI Spotlight give the same results as mdfind?
posted by ill3 at 10:11 PM on August 9, 2012


... you can append "$" to the end of all the lines in the above myfiles.txt to avoid getting directory matches (If you don't want those) - Ex:
foo.bar$
faz.bat$
fin.txt$
("$" is grep's 'End of line' match. )

Or, since we're manipulating the myfiles.txt, just change it to:
find / -name foo.bar
find / -name faz.bat
find / -name fin.txt
...and run it from the command line with:
csh ./myfiles.txt > outputfile.txt
... Should run pretty quick once the file directory is loaded into memory, and the results are in outputfile.txt .
posted by Orb2069 at 4:03 AM on August 10, 2012


OS X is a Unix, and in Unix, you can always solve a problem in a zillion different ways. All of the solutions being presented here will work, with various caveats. If you're able to manage the filenames in such a way that they never have spaces or quotation characters in them, then there's a whole class of problems you can neatly avoid.

Any solution that involves using a "find /" command will inherently be slow, because the system has to manually scan through every file name on the entire system, one at a time, sequentially. This involves many, MANY disk seeks. And it has to repeat that entire process for every file you look up. After the first time, it'll be faster, because the directories will be cached in RAM, but it's still doing a heck of a lot of work. The bigger your drives get, and the more files you look up, the slower the process will become. If it's a one-off, you may not particularly care, but if it's something you'll be doing a lot, I'd suggest avoiding find.

Spotlight is specifically designed to do what you're looking for -- it maintains an index of all the files on the drive, so that looking up a specific name is extremely fast. So sbutler's suggestion of using mdfind should be extremely quick, running in probably a half-second or so, as opposed to, potentially, several tens of seconds.

Artlung's suggestion of using 'locate' would also work; it's an early stab at something like Spotlight, where a program indexes all your files into a database, and then updates itself once a day or so. That's a good solution, much faster than find, but Spotlight is a superset of locate. And you've already spent the CPU time and disk space for the Spotlight databases, so leveraging them is nearly free, resource-wise.

tl;dr version: use Spotlight if you can. If its mdfind utility doesn't do what you need, then locate would be a good fallback. If efficiency matters at all, don't use find. It will work, but it's very slow.
posted by Malor at 6:42 AM on August 10, 2012


Thanks for all these options! To clarify: the files do have spaces in the names, there are dozens of files to find, and it only needs to crawl the userspace, technically I could even restrict it to a particular user folder.
posted by odinsdream at 6:54 AM on August 10, 2012


I usually do this sort of thing via emacs. I would change the list:

foo.txt
bar baz.txt
...


to look like the actual command script:

find . / \( \
-name "foo.txt" -o \
-name "bar baz.txt" -o \
... \
-o false \) -true -print

Then run it from command line. Any competent text editor can make those changes to each line then I add the find boilerplate. I do it this way whenever I'm faced with a one-off problem.
posted by chairface at 5:23 PM on August 10, 2012


So it looks like the mdfind option is super quick!

This is probably really simple, but what's the best way to combine the list of candidate files into the Bash script itself?

Like...

LIST OF FILES
file1
file2
file3
END LIST OF FILES
...the find command referencing the above chunk...

This would make it super easy to create one shell file to click and run, to create the report.
posted by odinsdream at 6:50 PM on August 10, 2012


Untested, but I believe this will work:
while read -r FILE; do mdfind -name "$FILE"; done <<STOP
file1.txt
file2.txt
file3.txt
STOP

posted by sbutler at 2:17 AM on August 12, 2012


« Older My best friend of three years,...   |  How can I block certain sites ... Newer »
This thread is closed to new comments.