What's a good tool to automate tedious html editing?
June 11, 2005 12:25 PM   Subscribe

What's a good tool to automate tedious html editing?

If I was instructing someone in English to perform the type of task I'm talking about, it might sound like this: "Insert '%a' before every instance of 'href' but only in cases where the url does not contain a question mark."

The more or less ironic part of this is that I know how to solve this problem using VBA, and hence could probably do it in Word, but I don't want to use Word, because I don't want to risk the insertion of any special or hidden characters or line breaks.

As my task involves the tedious insertion of various character combinations into html that someone else has already written, I often begin my task by looking at the code and saying to myself "Ah yes, it's situation type B. Therefore, I want to insert '%g' before every 'href.'

So, how best to automate such tasks? I'm working in Windows XP.
posted by bingo to Computers & Internet (26 answers total)
 
Notepad?
posted by tozturk at 12:28 PM on June 11, 2005


Response by poster: Notepad has programmable macros? Sorry if I didn't make this clear: basic search and replace is not enough.
posted by bingo at 12:37 PM on June 11, 2005


Response by poster: ...and even if it was, there would have to be about 50 different search and replace functions all lumped into one command. Does Notepad do that?
posted by bingo at 12:38 PM on June 11, 2005


Have you looked into Macromedia Dreamweaver? You can build a custom code library (snippets) and drag and drop what you need on command. You can even map keypresses to your bespoke snippets code.
posted by AlexReynolds at 12:51 PM on June 11, 2005


Failing that, if you have access to a Mac, you could use BBEdit's regular expression functionality to accomplish the same.
posted by AlexReynolds at 12:53 PM on June 11, 2005


I think that some of the following may help:

Note*tab* lite using regular expressions or "grep."
Perl and regular expressions
Python "
PHP

http://www.google.com/search?client=safari&rls=en&q=awk+for+html+coding&ie=UTF-8&oe=UTF-8

http://www.delorie.com/gnu/docs/gawk/gawk_toc.html#SEC_Contents


In short, a simple scripting language that likes plain text. Or you could even figure out how to do it in Applescript, jump ship, and bring the finished product back to windows.

Or a windows install of Awk paired with the google search "awk for html" or something similar.

One really kludgey solution: use a basic windows editor to cut every url with a ? in it.

Do the aforementioned search and replace

get a second copy of the file and cut every url *without* a "?"

Then merge both versions.
posted by mecran01 at 12:54 PM on June 11, 2005


perl, sed, or awk.

Seriously- install something like cygwin (if you like linux, you'll like this), or individual ports of things like tcsh and all the individual unix utils (if you like BSD, like I do, you'll probably like this more). You can use ActivePerlwithout all this, but its really not as much fun.

On a side note, there have been a lot of times I've wanted a Win32 GUI program to just do something like this quickly & visually without having to break open the perl reference book. If other people would be interested in this, please contact me... I'm up for writing it, but only if people would actually use it.
posted by devilsbrigade at 12:56 PM on June 11, 2005


I do a lot of work with double byte characters and in my search for a robust text editor that handled this well I've stumbled on a program called EmEditor. It has very robust macro functionality (when you have to do the same tasks in 11 different languages - you look for automation). From the help:

EmEditor Professional allows you to create functionally-rich macros using JavaScript or VBScript. Features of the macros include:

Windows Scripting Host
Support for JavaScript or VBScript
Macros That Can Define Most Operations in EmEditor
Integrated Development Environment for Macros
Modular Design of EmEditor Macros

I think it cost me 40 bucks or so and I now use it for everything.
posted by Wolfie at 1:03 PM on June 11, 2005


ConText is pretty awesome.
posted by Mach5 at 1:09 PM on June 11, 2005


Also check out Text Pad. Some programmer guys I know swear by it.
posted by Fozzie at 2:44 PM on June 11, 2005


I use Notepad2 and it's excellent. You can do regex search and replace functions with it.
posted by Count Ziggurat at 3:05 PM on June 11, 2005


For what it's worth, I edit my CSS with TopStyle.
posted by NickDouglas at 3:06 PM on June 11, 2005


Search for:

href="([^?\"]+?)"

Replace with:

SOMETHING href="\1"

Should work in most regex search and replaces.
posted by wackybrit at 3:16 PM on June 11, 2005


Yep, all you need is an editor that supports regex search and replace. I use Emacs, but it is probably a bit much for a non-programmer. Before I fully made the Emacs plunge I liked UltraEdit a lot.
posted by grouse at 3:39 PM on June 11, 2005


FrontPage 2003 has an advanced search and replace function that will do exactly what you want it to do, and you don't have to write regular expressions (you build your query visually).

I know, lots of FrontPage haters in here, but FrontPage 2003 (not earlier versions, they really do suck) is different, I swear. Check it out (not a plug but may be confused with one): this site is 100% XHTML Strict compliant and built and managed 100% in FrontPage 2003 in GUI mode. So, yes, it can do standards compliant code. If anyone tells you that one program "Sucks Balls" or any such nonsense, take their opinion with a grain of salt, for they have an agenda, or are otherwise missing some information.

That said, UltraEdit, emacs, and TextPad are all excellent editors, TextPad probably being the easiest to learn of that particular lot.
posted by Merdryn at 3:45 PM on June 11, 2005


Another vote for Text Pad here. I use it for this kind of stuff all of the time. Read the online help for SnR options.
posted by lilboo at 3:47 PM on June 11, 2005


I use BBEdit, but when I was using a PC, Search and Replace (yup, that's the name of the app) was indispensable. Supports regex. It will do what you're asking for, and a lot more. Twenty-five bucks.
posted by bricoleur at 4:12 PM on June 11, 2005


HomeSite and UltraEdit will both allow you to easily replace the same thing in many files, even throughout subdirectories -- with a single click.
posted by o0o0o at 4:18 PM on June 11, 2005


The code from that FrontPage site was a lot better than the others I have seen. But it still has lots of junk like unnecessary nbsps all over the place.
posted by grouse at 4:29 PM on June 11, 2005


You don't want an editor, you want a scripting language. What you want to do is the sort of thing Perl was designed for.
posted by orthogonality at 8:31 PM on June 11, 2005


Another possibility: if the HTML you're getting is valid, just incomplete, you might consider loading the HTML into Firefox, adjusting the HTML in the DOM using javascript, and then saving the resulting HTML.

The advantage to this is that you don't need to explicitly deal with end tags or parsing out attributes, e.g., this HTML:
<i><a href="http://foo" target="new" onclick="alert('Thanks for clicking me!');">Click Here!</a></i>
become an italics node with a child anchor node; and the anchor's attributes are already parsed out and available as member variables of the anchor node: anchor.target ="new" and anchor.onClick.toString() = "alert('Thanks for clicking me!')"

If I were doing this, I'd code up the various transformations you use each as an separate GreaseMonkey script, and apply those transformations I needed, using document.getElementsByTagName.

Example: make sure all links open in the window named "foo":
var a = document.getElementsByTagName( "anchor" ) ;
for( var i = 0 ; i < a.length; ++i ) br> a[ i ].target="foo";

//now get the HTML back:
var html = document.innerHTML ;

(Note that the standard GreaseMonkey won't, for security reasons, let you save to a file. But I have a hacked copy that dispenses with security and includes functionality to save to a file; email me if you want it.)
posted by orthogonality at 8:47 PM on June 11, 2005


Another vote for Dreamweaver - its Find and Replace function not only supports regular expressions, but it also allows you to save queries, so when you have multiple different queries you need, you don't have to type them more than once.
posted by robhuddles at 9:23 PM on June 11, 2005


I'll second Notepad 2.
posted by yoga at 4:55 AM on June 12, 2005


Dreamweaver - its Find and Replace function not only supports regular expressions, but it also allows you to save queries

Same goes for BBEdit and Search and Replace.
posted by bricoleur at 5:22 AM on June 12, 2005


TextPad's search and replace is fantastic, and if you know a geek who knows regular expressions (or if you find a good regex recipe on one of the sites online that specialize in that sort of thing) you can make *very* quick work of this kind of project. I do this stuff at least once a week in TextPad and it never takes more than a minute.
posted by anildash at 2:38 AM on June 13, 2005


You may also want to check out bkreplace - its been a lifesaver for me.
posted by korej at 4:43 PM on June 13, 2005


« Older Driving while high - Interstate 420, man   |   Ancient Computer Screenshots Newer »
This thread is closed to new comments.