Thanks all for the thoughts. I have to do this, sadly. Trying to move an archaic HTML-page-based web zine/journal (content pasted in from MS Word; my God!) to MT and need to parse out entries from HTML.What in the world has that got to do with using a parser instead of a regular expression? Of course you have to strip tags, nobody is doubting that. Using REs to do it is what is so bad.
< [bi](\s.*?)?>><i class="foo">. <[bi](\s.*?)?> with the [bi] bit straight after the bracket.
I think /< ([^bi])+>/ is what you need.
I'm not sure what you are trying to do with the "*?" but it usually does not make sense to comine them since "?" (0-1 instances) is contained in "*" (0-infinite number of instances).>
posted by snownoid at 11:58 AM on December 13, 2005