Regex Madness...filter. How do I pull the text out of an html document without looking at the tag attributes?
I'm trying to pull certain things out of an html document. Let's say, for simplicity's sake, it looks like this... 'cept with html tags. (Had to change 'em to display here.)
[!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"]
[meta http-equiv="content-type" content="text/html; charset=windows-1250"]
[meta name="generator" content="PSPad editor, www.pspad.com"]
Some text is [a href="fjkj.html"]here[/a]
All I want out of that thing is:
Some text is
Is that possible? I thought I had something working... but I was so wrong.
I tried to spider down through the dom, but I never could get that right either.
As a bonus... is there a particular book/tutorial folks recommend for understandings the mighty regex?