XML vs. Serialization
January 18, 2008 3:35 PM   Subscribe

For database usage, is it better to store data in XML or serialized format? In particular, PHP serialization.

PHP serializing has this length attribute that makes manually typing out an entry a pain, but maybe that makes it render quicker than XML parsing? Are there any other features down the road that might be affected?
posted by destro to Computers & Internet (10 answers total)
 
Really need more info on what you're doing here to answer this. Why use an XML or Serialized format instead of SQL tables?
posted by bitdamaged at 3:44 PM on January 18, 2008


Response by poster: It's for small amounts of data that wouldn't make it worthwhile to create a separate database table for.
posted by destro at 3:50 PM on January 18, 2008


I'm no expert, but my first thought would be that serialized might be a little faster as an xml parser is probably a lot more robust than unserialize() However XML will be much more portable. Also, xml is probably a much less fluid standard than serialization in php which can change from version to version, possibly making old data incompatible.

Also, xml is much more readable than seriaized data, in case you need to snoop into your data-files to debug.

Personally, I'd go with xml. SQL before that. Or what about sqlite if you don't have a db server?
posted by JRGould at 3:53 PM on January 18, 2008


Best answer: Well, there are a few different variables here that you need to consider and prematurely optimizing for one might hurt you in other places. One variable is the speed to serialize and de-serialize the data. Another is stability of the serialized format. Another still is how easy it is to work with the serialized form (how easy to create/parse/update/etc.)

In a sense XML is a serialized form (that has a lot of language neutral documentation and tools). The advantages of it is that it is absolutely stable and support for parsing and creating XML 1.0 documents will probably not disappear for a long time. Its also probably more work since you are the one writing code to create and then parse the documents back into PHP data structures.

Using a platform's serialization capability, like PHP or Java's or .Net's means that you are much more reliant on them keeping the format stable over time. It would suck a lot to upgrade your platform and find out that the serialization format has changed.

As for what is faster...only good benchmarks with your data can tell you for sure. XML parsing has come a long way since it's inception.
posted by mmascolino at 4:00 PM on January 18, 2008


Best answer: If you use the DOM library (http://us3.php.net/dom) to build your XML document before storing it, it will be perfectly conformant and can be used without much effort in other processing pipelines.

If the data will only be read by other PHP applications running in the same environment there's no reason to incur the overhead of XML generation. The native serialized format will work fine. This is likely to offer better performance because there's less parsing, but there are a number of variables in that equation and YMMV.

I have learned to avoid the halfway solution: generating XML by appending strings (i.e. $string = '' . $value . '';, etc.). Whenever I have done this, no matter how strict I am about making sure everything is properly encoded and sanitized, I still find myself getting stung by odd character-encoding edge cases. Very often they are "invisible" characters that are difficult to track down. Fixing those bugs is a bad way to spend your time.
posted by ftrain at 4:01 PM on January 18, 2008


There are so many answers to your question... it really depends on your needs. Sometimes you want fast development speed, sometimes fast performance, sometimes low memory overhead, sometimes scalability, sometimes flexibility, etc...

I'd say think about the needs and go with a persistence strategy / solution that matches those needs.
posted by brandnew at 4:35 PM on January 18, 2008


You could do a lot of math to figure out which one is actually faster, but my take on it is this:

If you need some human-readable data, use SQL (be more creative with your data model). If you need speed and don't need to run queries on the data very often, use serialization.

I'm sure they exist, but I'm having a hard time coming up with the usecase for using XML that doesn't involve a need to export the XML somewhere else.

Note that "serialization" could involve something far less complicated and more stable than PHP's internal method. For example, if you are trying to store a list of integers, you could just "serialize" them into a comma-separated list. It all depends on how complicated, varied, and predictable the data is going to be.
posted by toomuchpete at 4:42 PM on January 18, 2008


XML. Databases such as Oracle and (soon) PostgreSQL have an XML column type. They also have built in support for such cool things as using XPath inside an SQL statement. Try doing that with the PHP serialized format.
posted by sbutler at 7:13 PM on January 18, 2008


Response by poster: XML it is. Sounds like I just need to put in the extra effort to use the built in PHP XML parsers and constructors to keep in uniform and bug free.
posted by destro at 8:17 AM on January 19, 2008


JSON is another valid option that seems to be supported natively by PHP 5.2.
posted by syzygy at 10:33 AM on January 19, 2008


« Older How to organize scientific article pdfs?   |   Deep Fried and Double Wide Newer »
This thread is closed to new comments.