Recursive Regex Fun?
January 8, 2006 4:31 PM   Subscribe

Regexfilter: I'm using PHP and I want to match HTML span classes recursively/in a hierarchy. Help/pointers would be much appreciated.

Example string:
<span class="heading">This is</span><span class="bodytext">not a heading.</span>
I can match a string like this just fine using this regex: <span class=\"(.*?)\">(.*?)</span> in preg_match_all. What I want to do is return a multi-dimensional array in case of nested spans, which currently confuse the hell out of my code (and me!). For example:
<span class="heading">This <span class="shiny"> is a</span> lovely heading</span>&span class="bodytext">bla bla.</span>
Thanks a million and one to all those that can help. :-)
posted by PuGZ to Computers & Internet (5 answers total)
Response by poster: Typing out those HTML entities were not easy, and now I see they've not wrapped! Today is not my day. ;-)
posted by PuGZ at 4:32 PM on January 8, 2006

Best answer: I would use an HTML Parser.
posted by Firas at 4:36 PM on January 8, 2006

Best answer: Or use the inbuilt PHP DOM functions. They should work fine with the samples you've given. More at the PHP manual.
posted by matthewr at 4:42 PM on January 8, 2006

Response by poster: I'm not sure if I should be relieved or depressed that the solution is so easy. Relieved because I don't have to go nuts learning the intricacies of regular expressions or depressed because I didn't even consider this before embarking on the route I had.

Thanks guys!
posted by PuGZ at 4:46 PM on January 8, 2006

So long as you're not doing it on the fly the most reliable method I've found is,

Malformed HTML String → HTML Tidy to XHTML → XML parser.

The XML parser could be PHP DOM, XSLT, XPath.

Although no specific examples come to mind, I'm assuming that HTML Tidy can deal with more variations of quirky HTML than other HTML parsers. Is this a correct assumption though - are there any tests that people have done for this?
posted by holloway at 8:11 PM on January 8, 2006

« Older Recover hiberfil.sys data   |   Curds and way too much whey Newer »
This thread is closed to new comments.