Shell scripting gurus! I've got a bash script which works...but I suspect there's a better way to do what I'm trying to do. Am I wrong?
I have a number of LaTeX documents that I'm converting to HTML using HeVeA. The basic conversion to HTML is fine but a few things need changing and cleaning up. So I'm writing a bash script to do it in a nice consistent way. I've got the whole thing working fine so far but there's one part that seems kind of ugly to me. I'm wondering if anyone with more shell experience might suggest a nicer way to do it.
So the problem:
The documents have footnotes and HeVeA puts them at the very end of the page with local links from the text to the footnotes and vice-versa. What I want to do (well, have done) is to copy the footnotes to the <a> tag's 'title' property, so the footnote is displayed when you hover over the link instead of having to navigate to the bottom of the page and back up. The following code accomplishes this, but I feel it's a little kludgy.
### read the input file into the variable 'document' ###
### count the number of footnotes in the document and cycle through them in a loop ###
num_footnotes=$(grep -c '"dd-thefootnotes"' $1)
for nth_footnote in $(seq 1 $num_footnotes)
### use grep to find the line containing the nth footnote, pipe that through sed to cut out the footnote itself ###
the_footnote=$(grep '"note'$nth_footnote'"' $1 | sed 's/.*dd-thefootnotes">\(.*\)/\1/')
### escape any '&' characters in the footnote and strip HTML tags ###
corrected_footnote=$(echo "$the_footnote" | sed 's|\&|\\\&|g; s|<\/.*>||g; s|<.*>||g')
### pipe the document through sed, finding and replacing the <A> tag for the nth footnote, return the substitution back to the 'document' variable ###
document=$(echo "$document" | sed 's|<A NAME="text'$nth_footnote'" HREF="#note'$nth_footnote'">|<A HREF="" TITLE="'"$corrected_footnote"'">|')
### write the new/corrected document to a temporary file ###
echo "$document" > temp.html
It's probably not obvious from the code but, the superscript in the main text is a link with: NAME="text1" HREF="#note1"
and the footnotes have links with the reverse: NAME="note1" HREF="#text1"
And the footnotes are also inside a <DD> tag with CLASS="dd-thefootnotes" which is how I find the things. Obviously the number changes depending on the specific footnote in question.
So is there a better, nicer, more concise way to do this? In particular I'm wondering if there's a way to do this with sed only. I suspect the answer to that is, "no," but if there's one thing I'm not, it's a sed expert.
The above script works and is fine for my purposes but I figure there's always more to learn so who's got suggestions?