Not-So-SimpleXML
December 8, 2008 4:11 PM Subscribe
How can I search an XML file with PHP 5?
i have an XML file that looks like this:
I need to find the new_url for a given news_id. For example, if the news_id is 200809024, I need new_url_two.
I can use SimpleXML to load the file and use PHP 5 to loop through each of the items. There are about 1,300 items, though, so I thought it'd be more efficient to find the new_url that matches the news_id I want. I've looked at tutorials for SimpleXML, XPath, and XQuery, can't find a solution.
(Replaced angled brackets with square ones.)
i have an XML file that looks like this:
[redirects]
[story]
[news_id]200810243[/news_id]
[new_url]new_url_one[/new_url]
[/story]
[story]
[news_id]200809024[/news_id]
[new_url]new_url_two[/new_url]
[/story]
[/redirects]
I need to find the new_url for a given news_id. For example, if the news_id is 200809024, I need new_url_two.
I can use SimpleXML to load the file and use PHP 5 to loop through each of the items. There are about 1,300 items, though, so I thought it'd be more efficient to find the new_url that matches the news_id I want. I've looked at tutorials for SimpleXML, XPath, and XQuery, can't find a solution.
(Replaced angled brackets with square ones.)
I think your XPath expression would be
redirects/story[news_id=$id]/new_url
Where $id is the ID you're searching for.
posted by sbutler at 4:37 PM on December 8, 2008
redirects/story[news_id=$id]/new_url
Where $id is the ID you're searching for.
posted by sbutler at 4:37 PM on December 8, 2008
The xpath expression i came up with is substantially similar to the others:
Better than null terminated's answer because it doesn't assume the later-sibling relationship. sbutler's example didn't work for me.
posted by jepler at 5:43 PM on December 8, 2008
tested in python, but hopefully works for any proper xpath implementation./redirects/story[news_id/text()="200810243"]/new_url/text()
Better than null terminated's answer because it doesn't assume the later-sibling relationship. sbutler's example didn't work for me.
posted by jepler at 5:43 PM on December 8, 2008
Mine worked with Perl's XML::XPath, but addmitedly I'm not very good at XPath expressions. Here's the code I used to test (note, it returns the node instead of just the content):
posted by sbutler at 5:52 PM on December 8, 2008
#!/usr/bin/perl -wuse strict;use XML::XPath;my $xp = XML::XPath->new( xml => <<EOXML<redirects> <story> <news_id>200810243</news_id> <new_url>new_url_one</new_url> </story> <story> <news_id>200809024</news_id> <new_url>new_url_two</new_url> </story></redirects>EOXML);my $nodeset = $xp->find( "redirects/story[news_id=${ARGV[0]}]/new_url" );foreach my $node ($nodeset->get_nodelist) { print "FOUND\n\n", XML::XPath::XMLParser::as_string( $node ), "\n\n";}
posted by sbutler at 5:52 PM on December 8, 2008
Actually, works for me in Python too (although I had to dust off my python book). Not sure what's up with your setup, jepler:
posted by sbutler at 6:06 PM on December 8, 2008
#!/usr/bin/env pythonimport libxml2import sysdoc = libxml2.parseDoc( """<redirects> <story> <news_id>200810243</news_id> <new_url>new_url_one</new_url> </story> <story> <news_id>200809024</news_id> <new_url>new_url_two</new_url> </story></redirects>""" );for node in doc.xpathEval( "redirects/story[news_id=%s]/new_url" % sys.argv[ 1 ] ): node.saveTo( sys.stdout, format=True );
posted by sbutler at 6:06 PM on December 8, 2008
I had mistyped sbutler's example. It does work in PHP:
$xml = new SimpleXMLElement($xmlStr);
$res = $xml->xpath('/redirects/story[news_id="200809024"]/new_url');
posted by null terminated at 6:08 PM on December 8, 2008
$xml = new SimpleXMLElement($xmlStr);
$res = $xml->xpath('/redirects/story[news_id="200809024"]/new_url');
posted by null terminated at 6:08 PM on December 8, 2008
Did you say you were concerned about efficiency?
I suppose you have some kind of database, you'll prefer to hit that on page loads and parse the XML just once.
posted by Tobu at 1:03 AM on December 9, 2008
I suppose you have some kind of database, you'll prefer to hit that on page loads and parse the XML just once.
posted by Tobu at 1:03 AM on December 9, 2008
This thread is closed to new comments.
preg_match( '#(?<=<news_id>' . $news_id . '</news_id>\s*<new_url>)[^<>]*(?=</news_id>)#', $str, $matches );
yes, yes, and now you have two problems
posted by rjt at 4:35 PM on December 8, 2008