how split a string in bash?
January 12, 2008 7:21 PM   Subscribe

using bash, i need to split a variable in two, on whitespace, but with just the first word in one variable, and the rest in the second variable. this is a one liner with perl. how in bash?

i have a properties file in the format:

variable (whitespace) value

values can include spaces, variables are always a single word. e.g.:

thisvar contents of this var

i need to extract 'thisvar' and 'contents of this var' as two strings.

in perl i'd do:

($first, $rest) = split (' ',$source)

how can i do this in a shell? (i.e. bash)
posted by jimjam to Computers & Internet (19 answers total) 5 users marked this as a favorite
Use # and % modifiers to ${}. I'll do my example on colon instead of space:
$ x=a:b:c
$ echo ${x%:*}
$ echo ${x#*:}
The bit after the % or # is more like a shell glob than a regular expression, in case you want to get fancy there.
posted by jepler at 7:31 PM on January 12, 2008

argh. the example for "up to the first colon" should have been
$ echo ${x%%:*}

%% and ## differ from % and # for whether the removed part is the shortest or longest removable part. read the bash manpage in case I've still botched the explanation.
posted by jepler at 7:33 PM on January 12, 2008



thing="once upon a time"

set -- $thing


echo "foo: $foo"
echo "bar: $bar"

foo: once
bar: upon a time

posted by rjt at 7:36 PM on January 12, 2008 [1 favorite]

while read first rest; do
  echo "$first = $rest"
done <>

posted by rhizome at 7:49 PM on January 12, 2008

oops, that diamond should be "done [left angle bracket] $source"
posted by rhizome at 7:50 PM on January 12, 2008

The only concern I would have with rjt's solution is that the shell script might interpret things inside the text string that you didn't want interpreted.

Seems like you could do this with awk, or even better with an appropriate pair of regular expressions using sed. Something like this:

first = `echo '$orig' | sed "s/\w.*$//"`
rest = `echo '$orig' | sed "s/[^\w]* //"`

It's been years since I've mucked around with shell scripts, so I'm not sure those syntaces are right, but some variation on that. The idea of the first command is to say "a whitespace and then everything up to the end of line, replace with nothing". The second is "anything that's not a whitespace and then one whitespace, replace with nothing". (Isn't "\w" an escape that means "whitespace"?)

Would there need to be single quotes around both expressions?
first = '`echo '$orig' | sed "s/\w.*$//"`'
rest = '`echo '$orig' | sed "s/[^\w]* //"`'

(Man, time was when I could have ripped this right off. But that "time" was 1982.)
posted by Steven C. Den Beste at 8:25 PM on January 12, 2008

Like SCDB, sed was my first thought too. One of the disadvantages of having learned on some primitive UNIX without such a nice shell, I guess. Using that bash-specific stuff would be way more efficient.

first=`echo $source | sed 's/\([^ ]*\) .*/\1'`
rest=`echo $source | sed 's/[^ ]* \(.*\)/\1'`
posted by sfenders at 8:33 PM on January 12, 2008

The only concern I would have with rjt's solution is that the shell script might interpret things inside the text string that you didn't want interpreted.

Unfortunately you're (sort of) right -- it does collapse white-space. It won't do anything odd with dollar-signs, back-ticks, or leading dashes, though.
posted by rjt at 8:37 PM on January 12, 2008

Awk versions differ substantially, this works on GNU Awk 3.1.5.
skorgu@obelisklet ~ $ echo "thisvar contents of this var" | awk '{ for (i=2; i<NF; i++) printf $i" " ; print $NF }'
contents of this var

skorgu@obelisklet ~ $ echo "thisvar contents of this var" | awk '{ print $1; }'

I hope that survives the scrubber.
posted by Skorgu at 8:47 PM on January 12, 2008

Oh and mine will collapse any whitespace in the value into single spaces.
posted by Skorgu at 8:49 PM on January 12, 2008

awesome - thanks everyone for the answers!

rjt's did the trick, and sed makes my brain hurt, so i stopped there ;)
posted by jimjam at 8:51 PM on January 12, 2008

$ A='thisvar contents of this var'
$ echo ${A%% *}
$ echo ${A#* }
contents of this var
posted by zengargoyle at 8:53 PM on January 12, 2008

RJT, would a star be interpreted as a filename wildcard and get expanded?
posted by Steven C. Den Beste at 11:03 PM on January 12, 2008

Steven, now I see what you're getting at. You're right -- it will still do the wild-card expansion (and I was hoping I'd thought of everything...) which is presumably not what jimjam wants.

This is fixable by putting a set -f line somewhere before the first set statement.
posted by rjt at 11:14 PM on January 12, 2008

I think rhizome's solution is better; it's clear and easy to understand, and it correctly allows for any amount of whitespace between the first word and the rest, and it correctly preserves inter-word whitespace in the rest.
posted by flabdablet at 2:53 AM on January 13, 2008

unfortunately, rhizome's solution leaves the 'first' and 'rest' vars in the subshell created by 'while'. You'll see this if you 'echo $first' after the 'done' -- it'll be empty, or whatever value was assigned to it before the while loop.
posted by jepler at 6:57 AM on January 13, 2008

The wildcard concern was the reason I thought sed would be the best answer. If you handle it properly (i.e. using double quotes instead of single quotes for the first echo) you can avoid all parasitic string interpretation. It will treat it as raw text and the shell won't try to interpret it.

Any approach which treats the string as a shell script parameter (e.g. relying on "shift") is subject to shell magic character interpretation, especially file name expansion.
posted by Steven C. Den Beste at 1:50 PM on January 13, 2008

Right, so apparently I was not fit to be writing shell scripts last night. Anyone reading this probably needs to know how to use regular expressions, whether they realize it yet or not, and I always feel bad when I post stuff to AskMe that has a syntax error in it. So although that which I put in my previous post would more or less work if you just add the missing '/' after each '\1', I feel compelled to add that what I really meant was:

first=`echo $source | sed 's/\s.*//'`
rest=`echo $source | sed 's/\S*\s*//'`

You'd want to use [ \t\n] or whatever in place of \s, and [^ \t\n] for \S, if it might need to work on some backward system that doesn't use GNU sed or the like.
posted by sfenders at 2:41 PM on January 13, 2008

jepler, that's not quite right. A while loop won't create a subshell in and of itself; it will only do that if it's a component of a pipeline. Try this:

echo whoops >blarg
while read; do foo=$REPLY; done <blarg
echo $REPLY
echo $foo

You'll find that although $REPLY echoes as empty, $foo doesn't. $REPLY is emptied by the EOF on read, but if the while loop were truly in a subshell then $foo would be undefined as well, and it isn't.

This, on the other hand, does put the while in a subshell, because of the pipe:

echo whoops | while read; do bar=$REPLY; done
echo $REPLY
echo $bar

In any case, processing lines from a configuration file inside a loop that reads them is a fairly natural and convenient thing to do. If this were my shell script, I'd definitely be doing it rhizome's way.

Steven, there's quite a lot you can do with strings in bash without the expense of launching an extra sed process. I often find it's worth looking at the Parameter Expansion section of the man page before I design in a call to sed. For example, the split jimjam wants could be done using bash's own regular expression support, like this:

source='thisvar    contents   of this var'

if [[ "$source" =~ $regex ]]
    echo "Variable = \"${BASH_REMATCH[1]}\""
    echo "Value = \"${BASH_REMATCH[2]}\""
    echo 'Ill-formed config line'

but personally I think the while/read construct is easier to understand, if for no other reason than not requiring you to spend ten minutes getting all the quoting exactly right.
posted by flabdablet at 4:57 PM on January 13, 2008

« Older I use Nokia's Exchange Mail pu...   |  Send me on a tour of Brooklyn ... Newer »
This thread is closed to new comments.