crown and anchor me
February 7, 2006 6:21 AM   Subscribe

Can someone give this HTML idiot an answer in plain English. I don't understand the responses on similar topics that are already on this site! Is there some way - preferably on BBEdit on the Mac - of automatically converting every instance of http://www.whateverurl.com in my html file into properly anchored code, ie: http://www.whateverurl.com?

I am using BBEdit on an OSX Mac but I could use a PC if absolutely necessary.

I have an html document containing hundreds of URLs written in the style http://www.whateverurl.com - but each URL is different, with different domain suffixes etc.

What I'm looking for is to automatically convert each of these so that the URL is properly anchored.

Simple solutions in plain English, please, as I can't understand the terminology used on similar posts.
posted by unclemonty to Computers & Internet (11 answers total)
 
I was going to suggest QuickReplace, but that won't work if all your URLs are different.
posted by borkingchikapa at 6:25 AM on February 7, 2006


Response by poster: Yes, each URL is different and some markedly so - they're all URLs taken from online news sites, so you can imagine all the gobbledegook at the end of each link.
posted by unclemonty at 6:28 AM on February 7, 2006


It's been about 3 software generations since I've used a WYSIWYG editor, but I know that back in the day Dreamweaver had a tag find/replace editor, where you could define rules such as:

find tags <a> where href attribute contains whatever.com*

With similarly functioning replace rules as well. It was really handy; probably the only thing that I miss from software of its type. You can download a trial and see if that does what you need it to do.

* underlined words could be customized as needed
posted by charmston at 6:36 AM on February 7, 2006


What charmston said. And *yay* for the Joni Mitchell reference.
posted by Jofus at 6:46 AM on February 7, 2006


It is my understanding that BBEdit has good regular expression support. Now, I've never used it, so apologies in advance if this post is totally unhelpful...

If BBEdit can do search/replace with regular expressions, then what you can do is search for this regular expression:

\b[a-z]+:\d*//((&(?:amp;)?|[.?!:])?[\w#~+=;%@\-/_]+)*

That crazy unreadable thing is a Perl-compatible regular expression for finding URLs. Hopefully BBEdit does Perl syntax and can recognize that monstrosity. You might just entering that as a search term and seeing if BBEdit actually finds any URLs in a test file.

Assuming that works, you then want to set your replace string to something like this:

<a href="$0">$0</a>

Here I've written $0 to mean the URL that was matched by the regular expression; BBEdit will probably have its own syntax for writing the replacement string. The idea is that the matching URLs are substituted twice, once inside the href attribute and once between the >angle brackets<.

Again, sorry about my vagueness, never having used BBEdit myself. Also I realize this is still a pretty technical explanation, so let me know if I failed your "plain English" requirement. :-)
posted by Khalad at 6:51 AM on February 7, 2006


Not what charmston said. As I read the question, he is looking for a way to create tags where none exist, not change tags that are already in place.
posted by jjg at 6:52 AM on February 7, 2006


Best answer: You have a list of URLs and you want to convert them to links?

Open the list in BBEdit, open the find panel and tick the Use grep box and the Start From Top box. Then carefully type

(http://[\S]+)

in the Search For box, and

<a href="\1">\1</a>

in the Replace With box. Click Replace All.

(on preview: Khalad's expressions wont work in BBEdit)
posted by cillit bang at 6:54 AM on February 7, 2006


cillit bang seems to have it. I was about to come and post this untrained but working monstrosity:

Find:

(http[a-zA-Z0-9\:\.\/\?]+)

Replace with the same thing he put.

Which should pretty much do the trick, but cillit bang's is much nicer.
posted by mikel at 7:16 AM on February 7, 2006


cillit bang's regular expressions are almost right for what you want in BBEdit. However, his expression will match existing anchored URLs as well as the unanchored URL. If you do a blind "Replace All" using his expressions you'll break all of your existing absolute URLs. I'd probably try something like...

[^">](http://[\S]+)

...in place of his first expression. The second expression is fine. My suggested expression attempts to ignore URLs that are inside a quote or that follow a >. For example, the following HTML sample has two URLs, but neither should be replaced:

<a href="http://www.example.com">http://www. example.com</a>
posted by RichardP at 7:21 AM on February 7, 2006


BBEdit can do 'grep' style search and replace according to the web page; I'm on a PC right now though.

Open the Find/Replace dialog, see if there is a checkbox button there to do 'GREP' or 'Regular Expression' search and replace. If not there dig in the Preferences. Enable it.

So you want to do a find on
http://([^ ]*)[A-Za-z]

and replace with
<a href="http://\1">\1</a>

Go case-by-case replacing for awhile until you see that it works (if it does!) and acquire some confidence.

Here I've defined a URL as the http:// and all the stuff after that has no spaces and ends in a letter. I think this might work okay, maybe better than many programs which like to include punctuation at the end of URLs, very infuriating especially if you put URLs in parentheses. However the pattern might still barf; you've gotta hand-check the URLs to be sure.

Read up on regular expressions so you can check it yourself, and increase your text file manipulation powers a millionfold.

On preview: as can be seen, finding the right regexp is something of a chore. I guess mine turns http://booger into booger.
posted by fleacircus at 7:39 AM on February 7, 2006


There's a really good BBEdit mailing list: http://www.barebones.com/support/lists/bbedit_talk.shtml where people are very helpful.

Various "simple" regular expressions to find a URL may well work for you, but as others have mentioned, there are all kinds of edge cases and wrinkles -- a URL followed by a linebreak, a URL at the very start or end of the file, a URL that's https: not http: and so on.

My advice is to work on a copy and hope that your URLs are all "simple" enough for the simple search-and-replace to work.

If it gets too horrible, someone on the BBEdit list may write a Perl script for you to do it. See under that exclamation mark menu where it says Unix Filters? That's your real power right there.
posted by AmbroseChapel at 2:53 PM on February 7, 2006


« Older Have you seen my gerbil?   |   I have a problem climaxing while on anti-anxiety... Newer »
This thread is closed to new comments.