Help me write a regular expression
September 10, 2008 9:14 AM Subscribe
regex question. I want to take a list of lines that look like this:
This Here is a Title one or more words long / www.something.com
and turn it into a list of lines that look like this:
<a href="http://www.something.com/" title="This Here is a Title one or more words long"> </a>
I have no idea how to do this, though.
perl -pi.bak -e 's!^([^/]+)\s*/\s*(.*)$! !' filename.txt
This will back up the original file in filename.txt.bak and leave filename.txt with the new content.
Things will get all screwed up if there are /'s in the title.
This leaves you with an empty a tag, which almost certainly isn't what you really want.
posted by Zed_Lopez at 9:31 AM on September 10, 2008
This will back up the original file in filename.txt.bak and leave filename.txt with the new content.
Things will get all screwed up if there are /'s in the title.
This leaves you with an empty a tag, which almost certainly isn't what you really want.
posted by Zed_Lopez at 9:31 AM on September 10, 2008
Best answer: MeFi ate the a tag. Trying again:
perl -pi.bak -e 's!^([^/]+)\s*/\s*(.*)$!<a href="$2" title="$1"> </a>!' filename.txt
posted by Zed_Lopez at 9:32 AM on September 10, 2008 [1 favorite]
perl -pi.bak -e 's!^([^/]+)\s*/\s*(.*)$!<a href="$2" title="$1"> </a>!' filename.txt
posted by Zed_Lopez at 9:32 AM on September 10, 2008 [1 favorite]
Crud, delete that -- I was working out my post in preview, and accidentally hit post.
posted by fings at 9:33 AM on September 10, 2008
posted by fings at 9:33 AM on September 10, 2008
If you're going to use Perl, you might as well use split:
perl -ne 'chomp; printf(qq{<a href="%s" title="%s">\n}, reverse split m{\s*/\s*}, $_)'
posted by jrockway at 9:39 AM on September 10, 2008
perl -ne 'chomp; printf(qq{<a href="%s" title="%s">\n}, reverse split m{\s*/\s*}, $_)'
posted by jrockway at 9:39 AM on September 10, 2008
BTW, I should point out that titles or URLs with odd characters will ruin the markup. You should probably escape those in a map {} block before the reverse/split.
posted by jrockway at 9:43 AM on September 10, 2008
posted by jrockway at 9:43 AM on September 10, 2008
Response by poster: Zed_Lopez, thanks, that works. Only thing you left out is prepending "http://" to the "www" and adding a trailing "/" and I can insert those in easily enough. Unfortunately, I do want the empty link, because I have to use images and I can't see a way to automate that, the file names are too inconsistent.
jrockway, thanks.
posted by Grod at 9:49 AM on September 10, 2008
jrockway, thanks.
posted by Grod at 9:49 AM on September 10, 2008
If you're going to use Perl and chomp and split, you might as well use autochomp and autosplit:
posted by nicwolff at 11:28 AM on September 10, 2008
perl -F'\s+\/\s+' -lane 'print qq(<a href="$F[1]">$F[0]<a>)'
posted by nicwolff at 11:28 AM on September 10, 2008
Heh I left out the protocol like Zed did:
posted by nicwolff at 11:30 AM on September 10, 2008
perl -F'\s+\/\s+' -lane 'print qq(<a href="http:/$F[1]/">$F[0]<a>)'
posted by nicwolff at 11:30 AM on September 10, 2008
And the / in the closing a tag, jeez:
posted by nicwolff at 11:31 AM on September 10, 2008
perl -F'\s+\/\s+' -lane 'print qq(<a href="http:/$F[1]/">$F[0]</a>)'
posted by nicwolff at 11:31 AM on September 10, 2008
This solution will capture trailing spaces before the first "/" if I'm not mistaken. Replace ([^/]+) with ([^/]+?).
posted by shmooly at 8:21 AM on September 12, 2008
posted by shmooly at 8:21 AM on September 12, 2008
This thread is closed to new comments.
This Here is a Title one or more words long / www.something.com
and turn it into a list of lines that look like this:
<a href="http://www.something.com/" title="This Here is a Title one or more words long"> </a>
posted by Grod at 9:23 AM on September 10, 2008