Probably nonexistent, but...
November 8, 2006 7:29 PM   Subscribe

Non-Java open source methods to convert flat files to XML?

I've looked and looked. Xerces is a possibility but my CTO is bigoted against Java. I tried Jitterbit and MS Integration Services, each of which suck in their own ways. Ideally I'd like to be able to perform rather complex flat file transforms (multiple record types) to XML. Don't make me write a perl script...any help is, well, helpful.
posted by nj_subgenius to Computers & Internet (6 answers total)
 
Xerces is also a perfectly usable C++ library, not just java.
posted by cmonkey at 7:44 PM on November 8, 2006


what language and platforms are acceptable?
posted by mmascolino at 9:06 PM on November 8, 2006


You don't give us a lot of information to work with...

It sound like java is to a more obvious solution path to you but the cto stands in the way.

Solution; do it in python on the jvm, called jython. You can still use xerces.
posted by jouke at 9:17 PM on November 8, 2006


And by flatfile to XML conversions, you want to use the XML library to build up and then serialize the XML tree so that you don't have to worry about illegal markup and escaping bad characters? If so then almost any modern programing language/environment is going to offer that. Although, generally that means working with the DOM which is hideous. If you are going to go the jython route then let me suggest using it with XOM which is a far more sensible API for handling XML documents.
posted by mmascolino at 10:13 PM on November 8, 2006


Builder (ruby's XML file.. er.. builder)? Or perhaps Python's PyXML (check out the "Creating New Nodes" part of the HOWTO)?

Between the two, I prefer Builder but I've been on a Ruby kick for a while now.
posted by jmhodges at 12:40 AM on November 9, 2006


If you're going to use python, use lxml, not PyXML PyXML is no longer being actively developed. lxml is an implementation of python's new elementtree api for xml that uses the kickass libxml2 bindings as it's backend, which means you get xpath, xslt, xinclude, etc.

Regardless of what language and library you use, if your record types really are quite varied and complex, it sounds like you might benefit from using XSLT. In other words, use Python or C or whatever to do a very simple conversion into xml, then use XSLT's template matching functionality to convert that into your actual desired xml format. XSLT is designed specifically for document conversions, and matching each record to its type can often be done much more naturally in xpath than by using a bunch of if tests, especially if your datastructures are deeply nested or recursive. Plus, xslt transformations tend to be easy to extend to handle new record types.

Of course, this assumes that you know XSLT. If you don't, it's probably not worth the effort to learn it.
posted by gsteff at 4:42 AM on November 9, 2006


« Older Anal stitching: the ultimate body mod...?   |   ? Newer »
This thread is closed to new comments.