Does this file search application exist?
January 17, 2014 11:20 AM   Subscribe

Is there an app that will take a set of files in one folder (wildcarded) and find probably references to those files within the text of the files within the same, or another folder (recursively)? Although I could probably write this app fairly quickly, I don't want to reinvent the wheel if I can help it.
posted by hanoixan to Computers & Internet (8 answers total)
 
Response by poster: edit: ...find probable references...
posted by hanoixan at 11:21 AM on January 17, 2014


What makes something a probable reference? Are you just looking for instances of filenames in the text of other files?
posted by alms at 11:22 AM on January 17, 2014


What platform? On Linux & OS X, this is a small but hairy application for find, xargs and grep.
posted by scruss at 11:31 AM on January 17, 2014


Best answer: find some_dir -name "some_pattern" # this gets the list of file names you want to search for (anything in "some_dir" matching "some_pattern).
| grep -o "[^/]*$" # trim off paths, just keep filenames.
| xargs -I "%NAME%" grep -RiPe "%NAME%" another_dir # for each matched filename, grep for it in "another_dir"

Complete example:
find . -name "*.txt" | grep -o "[^/]*$" | xargs -I "%NAME%" grep -RiPe "%NAME%" .

Looks through the current dir for files named "*.txt". For each file found, greps through all the files in the current dir for that filename. Prints matching files.
posted by tylerkaraszewski at 11:35 AM on January 17, 2014


Response by poster: This is for Windows, sorry I wasn't specific. It's for use by people without much tech abilities, so I can't ask them to do a bash command line. I'm looking for something akin to Agent Ransack, as far as ease of use is concerned.

And by probable match, I mean either by doing a 1:1 string match, or perhaps via Levenshtein distance with a threshold. For example, I may have files A001.txt, A002.txt, but a file references A003.txt. That would be useful information to me.
posted by hanoixan at 11:38 AM on January 17, 2014


So it sounds like you only really need to find the "A" "B" "Procter & Gamble" or whatever term-used-in-filenames within the files?
posted by rhizome at 2:38 PM on January 17, 2014


Response by poster: Not quite. I need to find textual references to the file names from a primary file list, within the content of a secondary file list. As an example, I may want to find all occurrences of a list of .mp3 files being referenced within a collection of .xml files, because I want to remove .mp3 files which are no longer being referenced.

On first look, the command line above would do what I want, but I can't have them running scripts on a command line. It needs a GUI.
posted by hanoixan at 3:09 PM on January 17, 2014


Response by poster: Marking as resolved, but not for a windows GUI.
posted by hanoixan at 7:38 AM on January 20, 2014


« Older Should I take on additional debt to become a...   |   Gardenfilter: please help me save this plant! Newer »
This thread is closed to new comments.