Representing HTML within an XML Schema
May 11, 2011 8:39 AM   Subscribe

How do I indicate the potential presence of HTML within an XML element in that xml document's Schema?

Putting together a schema for some XML files I produce, I've realised that I don't know how to represent the potential presence of html within an element.

So, for example if the xml is (square brackets obviously in there for tags):

[article]
[name]I am a story[/name]
[text]TEXT OF STORY WITH ASSORTED PARAGRAPH TAGS, STRONG TAGS ETC.[/text]
[/article]


Then how do represent that in the schema? I had originally assumed it would be something like:

[xs:complexType name="article"]
[xs:sequence]
[xs:element name="name" type="xs:string"/]
[xs:element name="text" type="xs:string"/]
[/xs:sequence]
[/xs:complexType]

But this doesn't appear to be the case...
posted by garius to Computers & Internet (5 answers total) 1 user marked this as a favorite
 
You have to represent the text containing HTML in a CDATA section.
posted by Rhomboid at 9:00 AM on May 11, 2011


One option would be to wrap the text in a CDATA element.

Otherwise, you may be able to import the XHTML schema and model the elements using that (however I am not sure that would work completely).
posted by dyno04 at 9:04 AM on May 11, 2011


CDATA is probably the simpliest way. There is a decent rundown on CDATA vs encoding the HTML over on this Stack Overflow thread.
posted by tommccabe at 9:05 AM on May 11, 2011


Response by poster: Hmmm. Was trying to avoid wrapping it in CDATA if I could. Mainly because it always feels like a bit of a cop out, and I was hoping there was some way to say within the schema:

"Okay, everything in this element is going to be valid xhtml of some kind."
posted by garius at 9:09 AM on May 11, 2011


    <xs:element name="text">
        <xs:complexType>
            <xs:choice>
                <xs:any namespace="http://www.w3.org/1999/xhtml" processContents="lax" />
            </xs:choice>
        </xs:complexType>
    </xs:element>
This will validate it using the XHTML schema if it is available, and skip validation apart from checking the namespace if not.

Alternatively, if you're not tied to using W3C XML Schema, you could also do this using Relax NG and NVDL - the Namespace-based Validation Dispatching Language, which is basically designed for validating mixed content like that.
posted by siskin at 11:50 AM on May 11, 2011


« Older Indexing the LAN server   |   Am I being oversensitive about an unexpected... Newer »
This thread is closed to new comments.