Help me write a regular expression
September 10, 2008 9:14 AM   Subscribe

regex question. I want to take a list of lines that look like this: This Here is a Title one or more words long / www.something.com and turn it into a list of lines that look like this: <a href="http://www.something.com/" title="This Here is a Title one or more words long"> </a> I have no idea how to do this, though.
posted by Grod to Computers & Internet (12 answers total) 1 user marked this as a favorite
 
Formatting went away.

This Here is a Title one or more words long / www.something.com

and turn it into a list of lines that look like this:

<a href="http://www.something.com/" title="This Here is a Title one or more words long"> </a>
posted by Grod at 9:23 AM on September 10, 2008


perl -pi.bak -e 's!^([^/]+)\s*/\s*(.*)$! !' filename.txt

This will back up the original file in filename.txt.bak and leave filename.txt with the new content.

Things will get all screwed up if there are /'s in the title.

This leaves you with an empty a tag, which almost certainly isn't what you really want.
posted by Zed_Lopez at 9:31 AM on September 10, 2008



awk -F / '{print ""$1""}'


posted by fings at 9:32 AM on September 10, 2008


MeFi ate the a tag. Trying again:

perl -pi.bak -e 's!^([^/]+)\s*/\s*(.*)$!<a href="$2" title="$1"> </a>!' filename.txt
posted by Zed_Lopez at 9:32 AM on September 10, 2008 [1 favorite]


Crud, delete that -- I was working out my post in preview, and accidentally hit post.
posted by fings at 9:33 AM on September 10, 2008


If you're going to use Perl, you might as well use split:

perl -ne 'chomp; printf(qq{<a href="%s" title="%s">\n}, reverse split m{\s*/\s*}, $_)'
posted by jrockway at 9:39 AM on September 10, 2008


BTW, I should point out that titles or URLs with odd characters will ruin the markup. You should probably escape those in a map {} block before the reverse/split.
posted by jrockway at 9:43 AM on September 10, 2008


Zed_Lopez, thanks, that works. Only thing you left out is prepending "http://" to the "www" and adding a trailing "/" and I can insert those in easily enough. Unfortunately, I do want the empty link, because I have to use images and I can't see a way to automate that, the file names are too inconsistent.

jrockway, thanks.
posted by Grod at 9:49 AM on September 10, 2008


If you're going to use Perl and chomp and split, you might as well use autochomp and autosplit:

perl -F'\s+\/\s+' -lane 'print qq(<a href="$F[1]">$F[0]<a>)'


posted by nicwolff at 11:28 AM on September 10, 2008


Heh I left out the protocol like Zed did:

perl -F'\s+\/\s+' -lane 'print qq(<a href="http:/$F[1]/">$F[0]<a>)'
posted by nicwolff at 11:30 AM on September 10, 2008


And the / in the closing a tag, jeez:

perl -F'\s+\/\s+' -lane 'print qq(<a href="http:/$F[1]/">$F[0]</a>)'
posted by nicwolff at 11:31 AM on September 10, 2008


This solution will capture trailing spaces before the first "/" if I'm not mistaken. Replace ([^/]+) with ([^/]+?).
posted by shmooly at 8:21 AM on September 12, 2008


« Older Will work for money   |   Giving the gift of beer to a friend in Brooklyn Newer »
This thread is closed to new comments.