Tags:


Help me write a regular expression
September 10, 2008 9:14 AM   RSS feed for this thread Subscribe

regex question. I want to take a list of lines that look like this: This Here is a Title one or more words long / www.something.com and turn it into a list of lines that look like this: <a href="http://www.something.com/" title="This Here is a Title one or more words long"> </a> I have no idea how to do this, though.
posted by Grod to computers & internet (12 comments total) 1 user marked this as a favorite
Formatting went away.

This Here is a Title one or more words long / www.something.com

and turn it into a list of lines that look like this:

<a href="http://www.something.com/" title="This Here is a Title one or more words long"> </a>
posted by Grod at 9:23 AM on September 10, 2008


perl -pi.bak -e 's!^([^/]+)\s*/\s*(.*)$! !' filename.txt

This will back up the original file in filename.txt.bak and leave filename.txt with the new content.

Things will get all screwed up if there are /'s in the title.

This leaves you with an empty a tag, which almost certainly isn't what you really want.
posted by Zed_Lopez at 9:31 AM on September 10, 2008




awk -F / '{print ""$1""}'


posted by fings at 9:32 AM on September 10, 2008


MeFi ate the a tag. Trying again:

perl -pi.bak -e 's!^([^/]+)\s*/\s*(.*)$!<a href="$2" title="$1"> </a>!' filename.txt
posted by Zed_Lopez at 9:32 AM on September 10, 2008 [1 favorite]


Crud, delete that -- I was working out my post in preview, and accidentally hit post.
posted by fings at 9:33 AM on September 10, 2008


If you're going to use Perl, you might as well use split:

perl -ne 'chomp; printf(qq{<a href="%s" title="%s">\n}, reverse split m{\s*/\s*}, $_)'
posted by jrockway at 9:39 AM on September 10, 2008


BTW, I should point out that titles or URLs with odd characters will ruin the markup. You should probably escape those in a map {} block before the reverse/split.
posted by jrockway at 9:43 AM on September 10, 2008


Zed_Lopez, thanks, that works. Only thing you left out is prepending "http://" to the "www" and adding a trailing "/" and I can insert those in easily enough. Unfortunately, I do want the empty link, because I have to use images and I can't see a way to automate that, the file names are too inconsistent.

jrockway, thanks.
posted by Grod at 9:49 AM on September 10, 2008


If you're going to use Perl and chomp and split, you might as well use autochomp and autosplit:

perl -F'\s+\/\s+' -lane 'print qq(<a href="$F[1]">$F[0]<a>)'


posted by nicwolff at 11:28 AM on September 10, 2008


Heh I left out the protocol like Zed did:

perl -F'\s+\/\s+' -lane 'print qq(<a href="http:/$F[1]/">$F[0]<a>)'
posted by nicwolff at 11:30 AM on September 10, 2008


And the / in the closing a tag, jeez:

perl -F'\s+\/\s+' -lane 'print qq(<a href="http:/$F[1]/">$F[0]</a>)'
posted by nicwolff at 11:31 AM on September 10, 2008


This solution will capture trailing spaces before the first "/" if I'm not mistaken. Replace ([^/]+) with ([^/]+?).
posted by shmooly at 8:21 AM on September 12, 2008


« Older How much should I charge for p...   |   Help me get a load of beer bou... Newer »

You are not logged in, either login or create an account to post comments



Related Questions
Plenty of beaches, but where are the best waves? October 13, 2008
What is your favourite blog? March 30, 2008
What are some good blogs/sites that are like... February 21, 2008
What weblogs will make me smart and interesting... January 29, 2008
What are the most intellectually stimulating... November 20, 2007