Not-So-SimpleXML
December 8, 2008 4:11 PM   Subscribe

How can I search an XML file with PHP 5?

i have an XML file that looks like this:

[redirects]
[story]
[news_id]200810243[/news_id]
[new_url]new_url_one[/new_url]
[/story]
[story]
[news_id]200809024[/news_id]
[new_url]new_url_two[/new_url]
[/story]
[/redirects]


I need to find the new_url for a given news_id. For example, if the news_id is 200809024, I need new_url_two.

I can use SimpleXML to load the file and use PHP 5 to loop through each of the items. There are about 1,300 items, though, so I thought it'd be more efficient to find the new_url that matches the news_id I want. I've looked at tutorials for SimpleXML, XPath, and XQuery, can't find a solution.

(Replaced angled brackets with square ones.)
posted by kirkaracha to Computers & Internet (8 answers total) 1 user marked this as a favorite
 
regex?

preg_match( '#(?<=<news_id>' . $news_id . '</news_id>\s*<new_url>)[^<>]*(?=</news_id>)#', $str, $matches );

yes, yes, and now you have two problems
posted by rjt at 4:35 PM on December 8, 2008


I think your XPath expression would be

redirects/story[news_id=$id]/new_url

Where $id is the ID you're searching for.
posted by sbutler at 4:37 PM on December 8, 2008


Here ya go.
posted by null terminated at 4:45 PM on December 8, 2008


The xpath expression i came up with is substantially similar to the others:
/redirects/story[news_id/text()="200810243"]/new_url/text()
tested in python, but hopefully works for any proper xpath implementation.

Better than null terminated's answer because it doesn't assume the later-sibling relationship. sbutler's example didn't work for me.
posted by jepler at 5:43 PM on December 8, 2008


Mine worked with Perl's XML::XPath, but addmitedly I'm not very good at XPath expressions. Here's the code I used to test (note, it returns the node instead of just the content):
#!/usr/bin/perl -wuse strict;use XML::XPath;my $xp = XML::XPath->new( xml => <<EOXML<redirects>        <story>                <news_id>200810243</news_id>                <new_url>new_url_one</new_url>        </story>        <story>                <news_id>200809024</news_id>                <new_url>new_url_two</new_url>        </story></redirects>EOXML);my $nodeset = $xp->find( "redirects/story[news_id=${ARGV[0]}]/new_url" );foreach my $node ($nodeset->get_nodelist) {        print "FOUND\n\n", XML::XPath::XMLParser::as_string( $node ), "\n\n";}

posted by sbutler at 5:52 PM on December 8, 2008


Actually, works for me in Python too (although I had to dust off my python book). Not sure what's up with your setup, jepler:
#!/usr/bin/env pythonimport libxml2import sysdoc = libxml2.parseDoc( """<redirects>        <story>                <news_id>200810243</news_id>                <new_url>new_url_one</new_url>        </story>        <story>                <news_id>200809024</news_id>                <new_url>new_url_two</new_url>        </story></redirects>""" );for node in doc.xpathEval( "redirects/story[news_id=%s]/new_url" % sys.argv[ 1 ] ):        node.saveTo( sys.stdout, format=True );

posted by sbutler at 6:06 PM on December 8, 2008


I had mistyped sbutler's example. It does work in PHP:

$xml = new SimpleXMLElement($xmlStr);
$res = $xml->xpath('/redirects/story[news_id="200809024"]/new_url');
posted by null terminated at 6:08 PM on December 8, 2008


Did you say you were concerned about efficiency?

I suppose you have some kind of database, you'll prefer to hit that on page loads and parse the XML just once.
posted by Tobu at 1:03 AM on December 9, 2008


« Older Death Metal 101   |   Need help with recruiter Newer »
This thread is closed to new comments.