How to parse unencoded ampersands in URLs in PHP?
December 11, 2003 3:55 PM Subscribe
How to parse unencoded ampersands in URLs in PHP with
Some of my blog content comes from a remote collaborative blog, and I just want to change &'s to & when they occur inside an
It's simple enough to do a
preg_replace()
while excluding already-encoded ampersands? (more inside)Some of my blog content comes from a remote collaborative blog, and I just want to change &'s to & when they occur inside an
<a href>
tag, for validation purposes.It's simple enough to do a
preg_replace('/&/','&',$string);
-- but what about when a conscientious submitter has already encoded the &
? Any regex experts know how to exclude that?Sorry, I should have added that %26 is the HEX code for &, and using that instead of & in an URI works a-ok, and you also don't have problems with XHTML/XML validation.
posted by riffola at 4:15 PM on December 11, 2003
posted by riffola at 4:15 PM on December 11, 2003
preg_replace('/&(?!amp;)/','&',$string);
perhaps? *prays that looks right on post*
posted by boaz at 4:22 PM on December 11, 2003
Probably superfluous, but a word of warning: Anything you do will be an heuristic, there is no way to distinguish between someone who writes & and means
Granted, the latter case is rather unlikely, but it becomes more likely as you start allowing more character entities.
posted by fvw at 4:30 PM on December 11, 2003
&
and has conscientiously encoded it for you, and someone who writes & and means &
followed by a
followed by m
followed by p
followed by ;
Granted, the latter case is rather unlikely, but it becomes more likely as you start allowing more character entities.
posted by fvw at 4:30 PM on December 11, 2003
Why do it the confusing yet cute way?
Use html_entity_decode() to convert the encoded ampersands back to regular ampersands then do your replace. And I'd suggest using str_replace() since you won't need any fancy matching rules. Much less overhead.
posted by y6y6y6 at 5:02 PM on December 11, 2003
Use html_entity_decode() to convert the encoded ampersands back to regular ampersands then do your replace. And I'd suggest using str_replace() since you won't need any fancy matching rules. Much less overhead.
posted by y6y6y6 at 5:02 PM on December 11, 2003
Synchronicity moment; I just spent a good chunk of today trying to solve the same problem in multiply-parsed XSL. (Turns out you just blindly wrap all childless text() nodes which contain the character '&' in CDATA tags on the first pass, is the trick to that one. Easy.)
posted by ook at 5:30 PM on December 11, 2003
posted by ook at 5:30 PM on December 11, 2003
« Older Seeking advice about Bernese Mountain Dogs | I'm trying to find a copy of the theme to "The... Newer »
This thread is closed to new comments.
posted by brownpau at 3:59 PM on December 11, 2003