Linux Bash - Write array contents to middle of existing file
February 4, 2019 12:43 PM   Subscribe

Not sure if this is a failing in my Google-fu, a rare edge case, or if it's not really possible, but I'm trying to write a bash script that will write the full contents of an array into an arbitrary point in an existing file.

What I currently have:
- An xml-format target text file: This means order of lines in the file is important, I can't just append the new items to the end and move on. Contents are in the format <key>value</key> . I know I can identify many expected keys, including in the area where I want to add the array contents
- An array containing an arbitrary number of elements (usually 2-6ish), potentially with whitespace, which is read (using readarray) from another file. This part is working OK, so I have a specific array I can reference with the required elements.

What I want:
Ideally I'd like the script to open a file, find a specific
<key>value1</key>
and replace it with:
<key>value2</key>
Array Element 0
Array Element 1
...
Array Element N
So far I've looked at the grep and sed tools, however, find-and-replace function of sed ("s/find/replace/") seemed to choke when confronted by an array option, plus I don't think it would put each element on it's own line anyway:

sed -i "s/<key>value1<\/key>/<key>value2<\/key>\n\${array[@]}/" filename.file

throws a syntax error at the { symbol of the array definition.

I'm happy to perform the value1/value2 replacement as a separate operation to the array-entry, although I'd like to place them somewhat close to each other.

Any thoughts?
posted by Nice Guy Mike to Computers & Internet (16 answers total)
 
So my quick answer is that this is a task for a real scripting language not shell script. This would be much easier in perl/python/ruby.

I think this can be done in shell, but it might be better to find the line to replace then head up to that line to output file, echo the new key/value stuff to the output file, for/foreach around the array and echo the array elements to the output and then head or tail to dump the rest of the file to the output file. Ugly but I think it would work.
posted by jclarkin at 12:59 PM on February 4 [4 favorites]


I'll second jclarkin and say that you should use a higher order scripting language's xml capabilities rather than trying to shell script something. Unless you are absolutely certain about how your xml is serialized, doing string manipulations on xml documents is a fool's errand. To do the xml stuff right you have to consider character sets, special character escaping and a few other things. If you use a good xml library, all those details will be taken care of for you and I bet your code will be way more maintainable.
posted by mmascolino at 1:06 PM on February 4 [4 favorites]


Even something like matlab/octave or R would probably be better than shell script, if you happen to already know one of those.

Unless this is literally a one-time task, and assuming you know no relevant languages, it’s probably worth learning very minimal Python necessary to do this using something like ElemenTree.

If you expect to only need this once and for short term, jclarkin seems to have a functional way to break it down that I agreee should work, but will still most likely be annoying and fiddly to pull off.
posted by SaltySalticid at 1:48 PM on February 4


Yeah bash and XML don't really play nice. This would be trivial in python or Ruby (or PERL).

Ruby has a nice little tool called Nokogiri that's great for parsing XML.
posted by aspersioncast at 3:45 PM on February 4


Well, it's not pretty but this works in a tester I whipped up:

ETA: Metafilter swallowed up my code. Here's a pastebin.

Oh so hacky...my kind of bash script! I had to use printf because I couldn't figure out how to get a bash array into sed with newlines. Thus, I use a temp file.
posted by Fortran at 4:07 PM on February 4 [2 favorites]


This isn't really an XML problem; it's a text file problem and bash/sed are totally fine for this. If your array is in a file ARRAY and it happens to contain

A
B
C
D
E

then this tiny script will change all your value1's to value2's + array.
stuff=$(cat ARRAY | xargs)
sed "s|<key>value1</key>|<key>value2</key>\n$stuff|; /$stuff/ s| |\n|g" filename.file > newfilename.file
What this does is:

* store your array elements in a bash variable on a single line (e.g. "A B C D E")
* read filename.file and replace array1 with array2 + ARRAY values
* convert single-line array values

A B C D E

to

A
B
C
D
E

I recommend against using sed -i unless you are sure you can replace your input data if you accidentally blow it away.

(sed lets you use "/" or "|" after the s; when dealing with HTML/XML, "|" is much more readable.)
posted by ldenneau at 5:01 PM on February 4


Consider a specialized command line tool like XMLStarlet.
posted by sammyo at 5:02 PM on February 4 [1 favorite]


Seconding everyone: bash is not the tool to use here.

Expanding on sammyo's recommendation of XMLStarlet, if you must use bash, do something like:

xmlstarlet pyx < file.xml | … your bash stuff here … | xmlstarlet depyx > out.xml

PYX is a simple text representation of an XML file. It's easy to parse and modify with Unix text tools.
posted by scruss at 5:31 PM on February 4


OK, I understand the advice to look at other languages - but this is one of those one-off tasks that I don't think will come around again - and at this point I've put about the amount of time into trying to automate the task that I would have if I'd just done it manually. Still, I feel like I'm close to a breakthrough.

Both Fortran and ldenneau have posted solutions with ideas I had started to think about - writing to a temporary file, reading it into a string and then using in sed, however I'm still having problems - specifically, when using a string to represent the Array Elements in the sed command, I receive an "unknown command: <" at a character number at the end of the sed double-quoted substitution command.

I assume that this is because the Array Elements are also <key>value</key> pairs, the $string value in the substitution is being expanded and sed is choking on the first < character of the first Array Elements.

I tried both ldenneau's solution and my own version (read array straight into string):

array=( <tag>HD</tag> <tag>BluRay</tag> )
string=$(printf "%s\n" "$array[@]}")
sed -i "s|<watched>false</watched>|<watched>true</watched>\n$string|; /$string/ s| |\n|g" "$j"/*.nfo

Which gives a subtly different "unterminated s" error - presumably the same issue, ie that sed is expanding the array elements and introducing a character that is breaking the command.
Edit: ("$j"/*.nfo is the location of the xml file, in case that wasn't clear)
posted by Nice Guy Mike at 8:23 PM on February 4


Nice Guy Mike, I tried my little script on an example and it worked exactly the way you want. Unfortunately Ask is ill-suited to pasting code because of the XML tags. DM me and I'll walk you through it.
posted by ldenneau at 10:21 AM on February 5



$ cat foo
one
two
three

$ perl -lne 'BEGIN{@a=@ARGV;@ARGV=(shift@a)}/two/&&do{print"foobar";print for @a;next};print' foo hello 'cruel world'
one
foobar
hello
cruel world
three
But yeah, I'd use an XML module (probably Mojo::DOM). Shell quoting shenanigans would irk me
too much to try and do it in Bash without resorting to temporary files and such.

A simpler sed-fu that is beyond me would be along the same lines. Don't print lines, if it matches
/two/ then quit, else print that line. (first command). Then print the replacement for /two/, then
print the array. (second command). Then go through the file again and not print until the line
after the /two/ until the end of the file. (third command). Then you put it all together.
(first < foo; second; third < foo) > foo.new
You could also use grep to just find the line-number of /two/ and then use `head` and `tail` plus some
math to rip out the before part of the file, then print the replacement, then rip out the after part of the file.
posted by zengargoyle at 2:04 PM on February 5


You might want to read about awk, which is an old-school way of associating arbitrary operations with arbitrary regular expressions in an input stream. But I have to agree with the advice that some time spent with a modern scripting language will pay off rapidly. And I speak as a guy who is proud of once writing a Turing machine simulator in sed.
posted by Gilgamesh's Chauffeur at 3:07 PM on February 5


I love awk but it is not the tool for xml.
posted by sammyo at 3:40 PM on February 5


If you feel you really need to whipbash it, bash it good:

(Simplified data values for clarity)
a=( "Element 0" "Element 1" "Element 2" )
astr="$( IFS='|' ; echo "Value 2|${a[*]}" | sed 's/|/\\n/g' )"
sed -i -e "/Value 1/a ${astr}" -e "/Value 1/d" file

* Reformat the array into a string delimited by literal "\n"
* Sed: First insert value 2 and string data, then delete value 1

Assumes key 1 string is unique in the file and contained within a single line, and elements do not contain the | char.
posted by zaixfeep at 1:58 AM on February 6


My solution above is adequate for a text file. I concur with the others that you should use an XML-savvy solution if you are concerned that unacceptable XML corruption is likely to ensue from my solution.
posted by zaixfeep at 2:11 AM on February 6


Been a little busy, but thanks to some sterling help from ldenneau early last week, there is a solution - at least a partial one - which shares some similarity with zaixfeep's:

string=$(echo "${temp_tags[@]}" | xargs | sed 's|/|\\/|g')

Writes the content of the array into a single-line string, then using sed adds an additional "\" symbol to "escape" the "/" symbol in the trailing "</tag>" for each entry in the array.

sed -i "s|<watched>false</watched>|<watched>true</watched>\n$string|; /$string/ s| *|\n|g" filename.file

Effectively runs a find-and-replace on each filename.file (I have this code in a script that runs sequentially through a number of subdirectories where the xml files are stored). This runs against a known key (in this case, <watched>false</watched>) and replaces it with an updated key, plus a newline (\n) and the contents of the string, $string. It then runs a find-and replace and changes the spaces in the array single-line string with newlines - note that this will break array contents with spaces included.

Big thanks to everyone, but particularly to ldenneau, who spent a good chunk of an evening helping me through this by email.
posted by Nice Guy Mike at 2:42 PM on February 9


« Older Book on economics for discussing stuff in the news   |   Laptop doesn't charge unless i switch the outlet... Newer »

You are not logged in, either login or create an account to post comments