Advertise here: Contact FM.


grep sed or awk me some values
January 10, 2007 2:36 PM   RSS feed for this thread Subscribe

I want grep (or sed or awk) to return a word between two "bookend" words.

Here are two examples of strings that I want to grep, with the word I want returned in bold in each case:

DBConnector dbConn = new DBConnector();
DBConnector aDBConnector=new DBConnector();

So the expected output from the grep would be:

dbConn
aDBConnector

There can be empty space before or after the strings used in the examples, but the string itself will usually only span one line.

It might also be useful if the regexp just gets the next word after DBConnector since it is entirely possible the code might look like this:

DBConnector dbConn;
dbConn = new DBConnector();

Cheers!


Yes, we're having connection problems.
posted by Null Pointer and the Exceptions to computers & internet (14 comments total)
There's probably a more efficient way of doing it, but "grep ^DBConnector | awk '{print $2}'" should work for both test cases you've given....
posted by aberrant at 2:42 PM on January 10, 2007


assumes that DBConnector is at the beginning of the line - if not, just get rid of the carat.
posted by aberrant at 2:43 PM on January 10, 2007


Close, although I'm getting the following in my results when there is no space between the = and the new :

aDBConnector=new
posted by Null Pointer and the Exceptions at 2:52 PM on January 10, 2007


...and I just realized that the second case shows no whitespace between $2 and =. My apologies, but the awk won't work properly - it will return aDBConnector=new. The kludge is to run the whole thing through "sed s/=.*//" but now we're getting into some really ugly scripting.
posted by aberrant at 2:53 PM on January 10, 2007


grep -ow "dbConn\|aDBConnector" < foo.txt

(Works on MacOS X)
posted by buxtonbluecat at 3:09 PM on January 10, 2007


Although it's not entirely clear if you want to find those entire lines and return just those bolded words, or if you want to find only the bolded words. I assumed the latter. Clarification?
posted by buxtonbluecat at 3:12 PM on January 10, 2007


Save this as "between.py" and call like so:
python between.py filename

import sys

def between(line,x,y,ignore):
    ret = ""
    startix = line.find(x)
    if startix >-1:
        endix = line.find(y)

        if endix > startix:
            ret = line[startix+len(x):endix-1]
            for elt in ignore:
                ret = ret.replace(elt, "")
            ret.strip()

    return ret

fname = sys.argv[1]
lines = open(fname).readlines()
lines = [between(x, 'DBConnector','new DBConnector(',"=") for x in lines]
lines = [x for x in lines if x]
print "\n".join(lines)

posted by Invoke at 3:21 PM on January 10, 2007


dexter:~ cfta$ sed -e 's/.*DBConnector\ \([^=]\{1,\}\)=.*/\1/' file
works for me.
posted by ctmf at 3:28 PM on January 10, 2007


even better, if it's a long file:
sed -n -e 's/.*DBConnector\ \([^=]\{1,\}\)=.*/\1/p' file
So you don't have to find it in all the uninteresting output.
posted by ctmf at 3:45 PM on January 10, 2007


Does your system have the program lex? It would be a piece of cake for that. (Sorry, I can't provide you an example; it's been 25 years since I last programmed anything in lex.)
posted by Steven C. Den Beste at 3:54 PM on January 10, 2007


In the "It might also be useful if the regexp just gets the next word after DBConnector" vein:

grep -o 'DBConnector\ \(\w\+\)' filename | awk '{ print $2 }'
posted by moift at 5:17 PM on January 10, 2007


quick revision: grep -o 'DBConnector\ \w\+' filename | awk '{ print $2 }'

I thought this would be doable in grep alone but I don't think it is because it lacks lookaround assertions, quel dommage.
posted by moift at 5:22 PM on January 10, 2007


also, perl golf:

perl -pe '/DBConnector (\w+)/;$_=($1)?"$1\n":""' filename
posted by moift at 5:30 PM on January 10, 2007


Does your system have the program lex? It would be a piece of cake for that. (Sorry, I can't provide you an example; it's been 25 years since I last programmed anything in lex.)
posted by Steven C. Den Beste at 3:54 PM PST on January 10


Steven C. Da Worst!
posted by Null Pointer and the Exceptions at 8:22 AM on January 11, 2007


« Older What are some of the stupid th...   |   I've "broken" the sh... Newer »
This thread is closed to new comments.