Inserting XML into an HTML document
May 15, 2008 5:44 PM   Subscribe

I wish to insert a namespaced XML document into an (X)HTML web page. The "XML Data Island" trick is verboten. There appears to be two ways of accomplishing this. Both continue to elude me.

The first method is to use JS to fetch the XML document and then run a traversal on the document, re-creating its nodes as nodes appended to the insertion holder element. The second method is to use XSLT to not-transform it, but merely replicate it directly into the holder element.

The source XML document is Docbook with namespaces, so the TITLE element doesn't collide with XHTML's self-same element (otherwise I get a validation error). This XML will then be styled using CSS.

I've been exhaustively researching and toying with this for the past week. It seems to me to be a perfectly natural thing to want to do, yet for the life of me I have not found a pre-made solution.

I spent most of today working on the JS solution, and I think I've nearly grokked it, save a few stumbling details. But as I was driving home it struck me that (a) this is going to be unbearably slow for any large document; and (b) a simple not-really-transforming XSLT might do the trick.

Anyhoo, I'd like to hear some ideas for ways of solving all this. I'm a rank newbie at the whole JS/DOM thing, but familiar enough that I'm more or less successfully muddling through it through judicious use of the internets. Feel free to post code, etc; at this point any info I can get that'll help me struggle through this is welcome100.
posted by five fresh fish to Technology (16 answers total) 1 user marked this as a favorite
 
What are you trying to do that requires a nested xml doc? I wonder if there is an alternate solution to the same problem that doesn't require relying on the browser to handle a namespaced xml correctly.
posted by cschneid at 6:20 PM on May 15, 2008


Best answer: 1. If you're going to be using XSLT, why not just transform the Docbook into XHTML and have it be all nice and backward-compatible?

2. Might you consider displaying the Docbook document in an IFrame? This would certainly be the quickest way to do it.

3. The code you want for the JS method is this:
function tree2tree(docB,elFromA) {
    if(elFromA.nodeType == 3) return docB.createTextNode(elFromA.nodeValue);
    var elInB = docB.createElement(elFromA.tagName);
    for(var i=0; i < elFromA.attributes.length;i++) {
        elInB.setAttribute(elFromA.attributes[i].name,elFromA.attributes[i].value);
    }
    for(var i=0; i < elFromA.childNodes.length;i++) {
        elInB.appendChild(tree2tree(docB,elFromA.childNodes[i]));
    }
    return elInB;
}
tree2tree(document,myXMLdoc.documentElement);

posted by goingonit at 6:25 PM on May 15, 2008 [1 favorite]


Also, note that doing this is "illegal" under XHTML (if you turned the DOM you got back into text and tried to vaidate it, it wouldn't.)
posted by goingonit at 6:27 PM on May 15, 2008


Response by poster: I do not want to transform to XHTML because I wish to send the DocBook document back to the server with user edits.

Please elucidate re: IFrame. I thought frames were considered a no-no.

I've successfully argued that we need not support non-compliant browsers: it's too costly. Ergo, if it works with Webkit and Presto, and Mozilla if it's not a handicap, it's all good. MSIE is dead to me.
posted by five fresh fish at 7:34 PM on May 15, 2008


So frame-based layouts are a no-no for a number of reasons: primarily, because they break the "URL -> content" mapping of the Internet (when you go to a frame-based page, you see multiple documents at a time, and the URL at the top of your browser doesn't have much relation to the content).

But IFrames have actually gotten a lot of love since this whole AJAX thing got started, since they're a very flexible way of including different documents on the same page, and they let you do all manner of neat things (even when hidden) with respect to server-side communication from JavaScript.

But that's all sort of beside the point for you. What you want to do is load a document inside another document, and that's exactly what IFrames were originally designed for. Plus, if this is an issue for you, you can make them border-less, to "seamlessly" integrate the content with the rest of the page.

And, as long as the XML document is served from the same domain as the containing XHTML document, JavaScript running on the outer page will have access to the XML document inside the IFrame, in case you decide you do need to read it.
posted by goingonit at 7:52 PM on May 15, 2008


Best answer: And just to clarify, you'd include the document thus. At the place in the page you want the XML document to show up:
<iframe src="path/to/xmldoc"></iframe>
and that's it.
posted by goingonit at 7:57 PM on May 15, 2008


Response by poster: AFAIK, the XML loading thing I'm using already tosses the document into an IFrame (a hidden one, I guess).

I'll play around with unhiding it and all that. Thanks!

How about the idea of a non-transforming XSLT to insert the document directly into the XHTML? Any validity to that?

And why on earth hasn't this become de riguer? I am astounded that no one's serving DocBook directly to the browser, let alone allowing the end user to click on a paragraph to edit it. Seems like a perfectly natural hook-up to me.
posted by five fresh fish at 8:25 PM on May 15, 2008


Can you show us a page, even a mockup, of what you're trying to do?

I'm confused -- you want to serve XML directly to a web browser, and you want users to be able to edit it? And you wonder why this isn't happening all the time?

Well, because browsers don't do XML, that's why. They do HTML. And because data-editing tasks in browsers are done using forms.

Maybe there's something I'm missing about this.

You have some data on the back end, and you want to present it to the user for editing, then send it back. That's trivial, but your idea that it should be done by dumping XML into the DOM is pretty ... unusual.
posted by AmbroseChapel at 8:53 PM on May 15, 2008


Response by poster: Modern browsers do XML. And XSLT. And sophisticated Javascript DOM. Everything I want is right there for the plucking, as evidenced by the fact that I — a lowly retard when it comes to this sort of thing — am doing it.

I've successfully had click-to-edit on individual block elements, ie. paragraphs, said edits being sent back to the server to be integrated into the source text based on the ID of the selected block. This will be much the same, only instead of having to screw around with inserting ID attributes on all the block elements, I can just send back the entire DocBook text, trusting SVN or GIT or whatever to do a proper integration of the modified text back into the source.

At this point I'm figuring I'm either some sort of genius, or am batshitinsane. Either way, I expect it to work in the end. :-)
posted by five fresh fish at 10:38 PM on May 15, 2008


Considering that a browser won't display anything more than what HTML does then, to me, it seems that putting DocBook XML in the page and styling with CSS is an unnecessary task. It would easier to transform the DocBook and bind the resulting HTML to the page.

You could transform DocBook XML nodes into div/span and @class names (perhaps prefixed with "db:" ... class attributes in HTML of course aren't just for CSS they're generic ways of marking up areas of the page). That would allow you to understand which part of the page was changed, old-browser compatibility, etc. The deviation from HTML prevents you from using WYSIWYG editors (or considering you're using DocBook, WYSIWYM)

Although you could do XSLT in the browser you could also do it on the server and just include that as part of the page generation. You could transform it on the server in order to get a wider range of browsers.
XSLT features disable in Firefox like 'text escaping', and I'm told that XSLT 2.0 features aren't in browsers yet. So why involve JavaScript in the XSLT?

The only part that seems to warrant JavaScript is the clicking and sending back data and forth. This project sounds like it could be a lot simpler whilst still maintaining the elegance of DocBook (just keep it server-side).

Perhaps one reason why DocBook isn't popular in the browser is that there's more to it than CSS. Search engines and accessibility tools can parse HTML, but not DocBook. You can only emulate browser functionality -- eg, you can't actually click on a DocBook hyperlink because a browser doesn't understand that element, so instead you are (presumably) attaching onclick events to the node.

Sorry if I sound a bit down on your project, but I've been trying lots of XML architectures over the years and I've attempted things like this before. Putting the transformation and DocBook in the browser is an unnecessary challenge, IMO.
posted by holloway at 2:13 AM on May 16, 2008


I've successfully argued that we need not support non-compliant browsers: it's too costly.

Are they hiring where you are?

I would ignore the people saying this is a terrible idea. This is what XML is for. I've done this with SVG and XHTML but I'm no longer at that job and don't have the code handy; I do remember certain tricky namespace bits (see Scripting in namespaced XML from Mozilla).

It would help if you could post more details as to the code you're using, and the problems you're having. I don't see why this should be unbearably slow (but I don't know what "large" means in this context).
posted by enn at 6:12 AM on May 16, 2008


(Also, I do agree with holloway that XSLT-in-the-browser is kind of a pain in the ass — though it's been a few years since I've tried it — and that XSLT is best done server-side, but I don't think you need XSLT here at all. If the point is to let the user edit the DocBook nodes, why transform those nodes into something else? I don't really understand what your reason is for considering a "not-really-transforming XSLT.")
posted by enn at 6:25 AM on May 16, 2008


Response by poster: Are they hiring where you are?

Continuously, but not for web programming. This is an in-house documentation demo project. I'm quite lucky to have been encouraged to try out my ideas.

What with Safari, Opera, and Firefox being readily available for all sorts of platforms, and being almost fully compliant with all the standards, it's not at all difficult to argue that MSIE should not be supported. Coping with its ridiculously lame idiosyncrasies and brokeness would result in tripling the costs. Way more cost effective and easier to just mandate that users download a better browser.
posted by five fresh fish at 4:40 PM on May 16, 2008


Response by poster: goingonit, thank-you! Both your suggestions kicked ass!

I implemented your JS first, replacing my badly broken attempt at doing the same. Took me a while to figure out how to insert the resulting tree into the page, but once I got that figured things were looking up.

And then I clued in that this was another variation of putting an XML data island into the page. Hence, not able to be referenced by CSS namespaces. Poop!

I really do not understand why the W3 gurus rejected the notion of XML data islands. They're such a natural fit with the browser. It'd make so many things so much easier. Oy vey.

Anyhoo, I gave your iframe suggestion a try. Damn, but did that work instantly. A little farting around with CSS and it all started looking so pretty!

So thank you, I owe you beers! (Alas, they'll probably have to be virtual beers.)
posted by five fresh fish at 4:44 PM on May 16, 2008


Response by poster: My remaining problem is unhiding the comment form that should be displayed when a <para> element is clicked (rather than allowing direct editing, I've decided to allow end users to annotate).

document.getElementById("theIframe").contentWindow.document.getElementById("commentform") should be the reference to it, but so far no loving.

I'm pretty damn pleased with how things are going so far. Thanks so much, y'all!
posted by five fresh fish at 4:53 PM on May 16, 2008


I really do not understand why the W3 gurus rejected the notion of XML data islands. They're such a natural fit with the browser. It'd make so many things so much easier. Oy vey.
Probably because they're an awful idea for the web. The lack semantics, entirely, because CSSing a block at font-size:20px doesn't make it a heading. Images can't have ALT text because they're not <img>s they're blocks with background images. Forms aren't <form>s they're onclick events attached to blocks that assemble an XmlHttpRequest object. It's a complex, brittle and inelegant mess that puts unnecessary demands on browsers.

Theres more to the web than what can be replicated in CSS and JavaScript. HTML has semantics.
posted by holloway at 5:10 PM on May 27, 2008


« Older Name that number.   |   Cannot find "Creative Suite Color Settings" under... Newer »
This thread is closed to new comments.