Search a text file for keywords but exclude lines if they contain another keyword?
February 7, 2012 12:22 AM   Subscribe

Do you know of a good technique or inexpensive software that will allow someone to search through a text file so the cursor moves incrementally from line to line if it contains the words "Item sold" but skip over that line to the next if it contains the words "Item sold" and ALSO "figurine"? Trying to search through a very long text file for the few instances that aren't figurine. Any hints GREATLY appreciated!
posted by capedape to Technology (13 answers total) 1 user marked this as a favorite
 
Assuming you're on Windows, install Cygwin and open a terminal (if on Mac or Linux, just open a terminal)

at the command prompt:

grep -F 'Item sold' FILENAME | grep -vF 'figurine' | less
posted by zippy at 12:26 AM on February 7, 2012


grep = global regular expression print - it matches stuff and prints it
grep -F = match this string (as opposed to a regular expression)
grep -v = skip over lines matching this
posted by zippy at 12:28 AM on February 7, 2012


What operating system?
posted by mr_roboto at 12:42 AM on February 7, 2012


Yep, just use grep as zippy notes.
posted by Blazecock Pileon at 12:49 AM on February 7, 2012


No need to install cygwin. Either grab grep.exe from in here. Or Windows has a built-in find.exe

find "Item sold" textfile.txt | find /v "figurine"

If you want case-insensitive searching (e.g. someone might have entered "Figurine" instead of "figurine")

find /i "Item sold" textfile.txt | find /v /i "figurine"
posted by robtoo at 1:22 AM on February 7, 2012 [1 favorite]


Grep doesn't answer the poster's question, because it isn't an editor that allows you to move from line to line. It just outputs lines. If you wanted to do any editing, grep won't help much. You COULD use the -n argument to the first grep to output the line numbers, then hunt for those line numbers in the file. If there were more than a few lines, that would get annoying, but the line would be something like

grep -nF 'Item sold' FILENAME | grep -vF 'figurine' | less

You can also get a text editor that supports regular expressions, like Notepad++ if you're on Windows, then do a search using an appropriate regular expression. Here's a tutorial for constructing the regex you want.
posted by Philosopher Dirtbike at 1:40 AM on February 7, 2012


If you have access to a spreadsheet program like Excel then you may be able to open up the text file in such a way that each entry appears as a row and each cell on that row is populated by parsing the text files for spaces, commas, tabs, etc. Once you have the data in that format you can just set filters to show the rows you require.

If you can't do this directly then might be able to use a text editor like Notepad++ to pre-format the text file so that you can (and you always have the reserve tool of regular expressions if necessary).
posted by rongorongo at 2:16 AM on February 7, 2012 [3 favorites]


As others have said, I'd be reaching for an editor with regular expressions. Specifically, perl compatible regular expressions (PCRE). But that's what most of the editors implement anyway.

FYI, a pattern that matches what you want is: /Item sold(?!figurine)/. That is, match "Item sold" not followed by "figurine".
posted by sbutler at 2:30 AM on February 7, 2012


Cygwin? Geez, talk about using a chainsaw where a nail clipper is more appropriate. Sounds like this is a one time text search. Excel, excel, excel!
posted by Yowser at 3:29 AM on February 7, 2012 [3 favorites]


In Excel, there's a function called "Conditional Formatting" that's available in your Home ribbon. I use the Highlight Cells for Duplicate Values function for things like this. Check to highlight the cells that contain "figurine" and it should turn all those cells pink, leaving those without that word highlighted in pink. Then you can scan the sheet and the un-highlighted cells should pop right out. I use this function a lot for comparing two long lists where I want to remove duplicate entries (like mailing lists culled from various sources).
posted by slogger at 5:32 AM on February 7, 2012


I may be setting myself up for a flaming here, but ...

I think Word can do this if you have it available. Open the text document in Word, then open a find and replace search. Select "use wildcards". Then futz around with the help until you can construct a "find" string containing the string "item sold" but not the string "figurine". (You'll probably have to end the find string with "^013" for the end-of paragraph character.) The bonus for doing it this way is that if you leave the replace box empty, but set a format in it (say, highlight all the found lines), you can save the file afterwards with all the found items nicely marked for you. You'll have to save it as a Word file though, not as a text file.

I don't have Word on this laptop, or else I'd have a shot at constructing the search string.
posted by Logophiliac at 7:16 AM on February 7, 2012


Response by poster: Windows 7 here, have a Mac at home, but I'm staying in a remote area right now. Have edit pad lite installed, but it doesn't do regular expression search unless it's pro version so I'll try notepad ++ which I've always heard good things about and report back. Thanks greatly for the replies.
posted by capedape at 3:29 PM on February 7, 2012


Response by poster: Notpad++ and regular expressions did the trick, cheers everyone and thanks for the suggestions.
posted by capedape at 1:15 PM on February 8, 2012


« Older Oceans never listen to us anyways.   |   adult proof outlets Newer »
This thread is closed to new comments.