Recursive Regex Fun?
January 8, 2006 4:31 PM Subscribe
Regexfilter: I'm using PHP and I want to match HTML span classes recursively/in a hierarchy. Help/pointers would be much appreciated.
Example string:
Example string:
<span class="heading">This is</span><span class="bodytext">not a heading.</span>I can match a string like this just fine using this regex: <span class=\"(.*?)\">(.*?)</span> in preg_match_all. What I want to do is return a multi-dimensional array in case of nested spans, which currently confuse the hell out of my code (and me!). For example:
<span class="heading">This <span class="shiny"> is a</span> lovely heading</span>&span class="bodytext">bla bla.</span>Thanks a million and one to all those that can help. :-)
Best answer: Or use the inbuilt PHP DOM functions. They should work fine with the samples you've given. More at the PHP manual.
posted by matthewr at 4:42 PM on January 8, 2006
posted by matthewr at 4:42 PM on January 8, 2006
Response by poster: I'm not sure if I should be relieved or depressed that the solution is so easy. Relieved because I don't have to go nuts learning the intricacies of regular expressions or depressed because I didn't even consider this before embarking on the route I had.
Thanks guys!
posted by PuGZ at 4:46 PM on January 8, 2006
Thanks guys!
posted by PuGZ at 4:46 PM on January 8, 2006
So long as you're not doing it on the fly the most reliable method I've found is,
Malformed HTML String → HTML Tidy to XHTML → XML parser.
The XML parser could be PHP DOM, XSLT, XPath.
Although no specific examples come to mind, I'm assuming that HTML Tidy can deal with more variations of quirky HTML than other HTML parsers. Is this a correct assumption though - are there any tests that people have done for this?
posted by holloway at 8:11 PM on January 8, 2006
Malformed HTML String → HTML Tidy to XHTML → XML parser.
The XML parser could be PHP DOM, XSLT, XPath.
Although no specific examples come to mind, I'm assuming that HTML Tidy can deal with more variations of quirky HTML than other HTML parsers. Is this a correct assumption though - are there any tests that people have done for this?
posted by holloway at 8:11 PM on January 8, 2006
This thread is closed to new comments.
posted by PuGZ at 4:32 PM on January 8, 2006