I have continued to evolve and refine MNS. It's changed enough that I've
decided to give it a new name. Attached is a tutorial/spec. I plan to
release this publicly in a few days along with an implementation (in Jing).
James
Author: James Clark <jjc@thaiopensource.com>
Date: 2003-06-10
Copyright © Thai Open Source Software Center Ltd
The XML Namespaces Recommendation allows an XML document to be composed of elements and attributes from multiple independent namespaces. Each of these namespaces may have its own schema; the schemas for different namespaces may be in different schema languages. The problem then arises of how the schemas can be composed in order to allow validation of the complete document. This document proposes the Namespace Routing Language (NRL) as a solution to this problem. NRL is an evolution of the author's earlier Modular Namespaces (MNS) language.
A sample implementation of NRL is included in Jing.
In its simplest form, an NRL schema consists of a mapping from namespace URIs to schema URIs. An NRL schema is written in XML. Here is an example:
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
<validate schema="soap-envelope.xsd"/>
</namespace>
<namespace ns="http://www.w3.org/1999/xhtml">
<validate schema="xhtml.rng"/>
</namespace>
</rules>
We will call a schema referenced by an NRL schema a
subschema. In the above example,
soap-envelope.xsd is the subschema for the namespace URI
http://schemas.xmlsoap.org/soap/envelope/ and
xhtml.rng is the subschema for the namespace URI
http://www.w3.org/1999/xhtml.
The absent namespace can be mapped to a schema by using
ns="".
NRL validation has two inputs: a document to be validated and an NRL schema. We will call the document to be validated the instance. NRL validation divides the instance into sections, each of which contains elements from a single namespace, and validates each section separately against the subschema for its namespace.
Thus, the following instance:
<env:Envelope xmlns="http://www.w3.org/1999/xhtml"
xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
<env:Body>
<xhtml>
<head>
<title>Document 1</title>
</head>
<body>
<p>...</p>
</body>
</xhtml>
<xhtml>
<head>
<title>Document 2</title>
</head>
<body>
<p>...</p>
</body>
</xhtml>
</env:Body>
</env:Envelope>
would be divided into three sections, one with the envelope namespace
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"> <env:Body/> </env:Envelope>
and two with the XHTML namespace:
<xhtml xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Document 1</title>
</head>
<body>
<p>...</p>
</body>
</xhtml>
<xhtml xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Document 2</title>
</head>
<body>
<p>...</p>
</body>
</xhtml>
Note that two elements only belong to the same section if they have a common ancestor and if all elements on the path to that common ancestor have the same namespace. Thus, if one of the XHTML documents happened to contain an element from the envelope, it would not be part of the same section as the root element.
This validation process can be refined in several ways, which are described in the following sections.
In most cases the schema will be in some namespaced XML vocabulary,
and the type of schema can be automatically detected from the
namespace URI of the root element. In cases where the schema is not
in XML and there is no MIME type information available to determine
the type, a schemaType attribute can be used to specify the
type. The value of this should be a MIME media type. For RELAX NG
Compact Syntax, a value of application/x-rnc should be
used.
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
<validate schema="soap-envelope.xsd"/>
</namespace>
<namespace ns="http://www.w3.org/1999/xhtml">
<validate schema="xhtml.rnc"
schemaType="application/x-rnc"/>
</namespace>
</rules>
With many schema languages, there can be different ways to use a
particular schema to validate an instance. For example, Schematron has the notion of a phase; an
instance that is valid with respect to a Schematron schema using one
phase may not be valid with respect to the same schema in another
phase. NRL allows validation to be controlled by specifying a number
of options. For example, to specify that validate with respect to
xhtml.sch should use the phase named Full, an option
could be specified as follows:
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
<validate schema="soap-envelope.xsd"/>
</namespace>
<namespace ns="http://www.w3.org/1999/xhtml">
<validate schema="xhtml.sch">
<option name="http://www.thaiopensource.com/validate/phase"
arg="Full"/>
</validate>
</namespace>
</rules>
Options may have arguments. Some options do not need arguments. For
example, for Schematron there is a
http://www.thaiopensource.com/validate/diagnose option.
If this option is present, then errors will include Schematron
diagnostics; if it is not, then errors will not include diagnostics.
With this option, no arg attribute is necessary:
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
<validate schema="soap-envelope.xsd"/>
</namespace>
<namespace ns="http://www.w3.org/1999/xhtml">
<validate schema="xhtml.sch">
<option name="http://www.thaiopensource.com/validate/diagnose"/>
</validate>
</namespace>
</rules>
Options are named by URIs. A number of standard options are defined
which all start with the URI
http://www.thaiopensource.com/validate/:
http://www.thaiopensource.com/validate/phasehttp://www.thaiopensource.com/validate/diagnosehttp://www.thaiopensource.com/validate/check-id-idrefhttp://www.thaiopensource.com/validate/feasibledata, list,
element and attribute element in an
optional element and then validating against the
transformed schema. This option is useful while a document is still
under construction.
http://www.thaiopensource.com/validate/schemaFor convenience, the URI specified by the name
attribute may be relative; if it is, it will be resolved relative to
the NRL namespace URI. The result is that the standard options above
can be specified without the
http://www.thaiopensource.com/validate/ prefix. For
example,
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
<validate schema="soap-envelope.xsd"/>
</namespace>
<namespace ns="http://www.w3.org/1999/xhtml">
<validate schema="xhtml.sch">
<option name="phase"
arg="Full"/>
</validate>
</namespace>
</rules>
Normally, an NRL implementation will make a best-effort attempt to
support the specified option and will simply ignore options that it
does not understand or cannot support. If it is essential that a
particular option is supported, then a mustSupport
attribute may be added to the option element:
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
<validate schema="soap-envelope.xsd"/>
</namespace>
<namespace ns="http://www.w3.org/1999/xhtml">
<validate schema="xhtml.sch">
<option name="phase"
arg="Full"
mustSupport="true"/>
</validate>
</namespace>
</rules>
If there is a mustSupport attribute and the NRL
implementation cannot support the option, it must report an error.
Multiple validate elements can be specified for a
single namespace. The effect is to validate against all of the
specified schemas.
For example, we might have a Schematron schema for XHTML, which makes various checks that cannot be expressed in a grammar. We want to validate against both the Schematron schema and the RELAX NG schema. The NRL schema would be like this:
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
<validate schema="soap-envelope.xsd"/>
</namespace>
<namespace ns="http://www.w3.org/1999/xhtml">
<validate schema="xhtml.rng"/>
<validate schema="xhtml.sch"/>
</namespace>
</rules>
Instead of a validate element, you can use an
allow element or a reject element. These
are equivalent respectively to validating with a schema that allows
anything or with a schema that allows nothing.
For example, the following would allow SVG without attempting to validate it:
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
<validate schema="soap-envelope.xsd"/>
</namespace>
<namespace ns="http://www.w3.org/1999/xhtml">
<validate schema="xhtml.rng"/>
</namespace>
<namespace ns="http://www.w3.org/2000/svg">
<allow/>
</namespace>
</rules>
Note that, just as with validate, allow
and reject apply to a section not to a whole subtree.
Thus, in the above example, if the SVG contained an embedded XHTML
section, then that XHTML section would be validated against
xhtml.rng.
You can use an anyNamespace element instead of a
namespace element. This specifies a rule to be used for
an element for which there is no applicable namespace
rule.
Namespace wildcards are particularly useful in conjunction
with allow and reject. The following
will validate strictly, rejecting any namespace for
which no subschema is specified:
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
<validate schema="soap-envelope.xsd"/>
</namespace>
<namespace ns="http://www.w3.org/1999/xhtml">
<validate schema="xhtml.rng"/>
</namespace>
<anyNamespace>
<reject/>
</anyNamespace>
</rules>
In contrast, the following will validate laxly, allowing any namespace for which no subschema is specified:
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
<validate schema="soap-envelope.xsd"/>
</namespace>
<namespace ns="http://www.w3.org/1999/xhtml">
<validate schema="xhtml.rng"/>
</namespace>
<anyNamespace>
<allow/>
</anyNamespace>
</rules>
The default is to validate strictly. Thus, if there is no
anyNamespace rule, then the following rule will be
implied:
<anyNamespace> <reject/> </anyNamespace>
You can apply different rules in different contexts by using modes. For example, you might want to restrict the namespaces allowed for the root element.
The rules element for an NRL schema that uses multiple
modes does not contain namespace and
anyNamespace elements directly. Rather, it contains
mode elements that in turn contain namespace
and anyNamespace elements. The validate
elements can specify a useMode attribute to change the
mode in which their child sections are processed. The
rules element must have a startMode
attribute specifying which mode to use for the root element.
For example, suppose we want to require that the root element come from
http://schemas.xmlsoap.org/soap/envelope/ namespace.
<rules startMode="soap"
xmlns="http://www.thaiopensource.com/validate/nrl">
<mode name="soap">
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
<validate schema="soap-envelope.xsd"
useMode="body"/>
</namespace>
</mode>
<mode name="body">
<namespace ns="http://www.w3.org/1999/xhtml">
<validate schema="xhtml.rng"/>
</namespace>
</mode>
</rules>
If a validate element does not specify a
useMode attribute, then the mode remains unchanged. Thus,
in the above example, child sections inside an XHTML section will be
processed in mode body, which does not allow the SOAP
namespace; so if the XHTML were to contain a SOAP
env:Envelope element, it would be rejected.
The reject and allow elements can have a
useMode attribute as well.
A single subschema may not handle just a single namespace; it may be handle two or more related namespaces. To deal with this possibility, NRL allows the rule for a namespace to specify that elements from that namespace are to be attached to a parent section and be validated together with that parent section.
Suppose we have RELAX NG schemas for XHTML and for SVG. We could
use these directly as subschemas in NRL. But we might prefer instead
to use RELAX NG mechanisms to combine these into a single RELAX NG
schema. This would allow us conveniently to allow SVG elements only to
occur in places where XHTML block and inline elements are allowed and
to disallow them in places that make no sense (for example, as
children of a ul element). If we have such a combined
schema, we could use it as follows:
<rules startMode="soap"
xmlns="http://www.thaiopensource.com/validate/nrl">
<mode name="soap">
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
<validate schema="soap-envelope.xsd"
useMode="xhtml"/>
</namespace>
</mode>
<mode name="xhtml">
<namespace ns="http://www.w3.org/1999/xhtml">
<validate schema="xhtml+svg.rng"
useMode="svg"/>
</namespace>
</mode>
<mode name="svg">
<namespace ns="http://www.w3.org/2000/svg">
<attach/>
</namespace>
</mode>
</rules>
This will cause SVG sections occurring within XHTML to be attached to the parent XHTML section and be validated as part of it.
RDF is another example where attach is necessary.
RDF can contain elements from arbitrary namespaces.
<rules startMode="root"
xmlns="http://www.thaiopensource.com/validate/nrl">
<mode name="root">
<namespace ns="http://www.w3.org/1999/xhtml">
<validate schema="xhtml.rng"
useMode="body"/>
</namespace>
</mode>
<mode name="body">
<namespace ns="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<validate schema="rdfxml.rng"
useMode="rdf"/>
</namespace>
</mode>
<mode name="rdf">
<anyNamespace>
<attach/>
</anyNamespace>
</mode>
</rules>
We could use the approach of attaching all namespaces as an
alternative solution to the XHTML+SVG example. Instead relying on NRL
to reject namespaces other than XHTML and SVG, we can instead attach
sections from all namespaces to the XHTML section, and allow the
xhtml+svg.rng schema to reject namespaces other than
XHTML and SVG.
<rules startMode="soap"
xmlns="http://www.thaiopensource.com/validate/nrl">
<mode name="soap">
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
<validate schema="soap-envelope.xsd"
useMode="xhtml"/>
</namespace>
</mode>
<mode name="xhtml">
<namespace ns="http://www.w3.org/1999/xhtml">
<validate schema="xhtml+svg.rng"
useMode="attach"/>
</namespace>
</mode>
<mode name="attach">
<anyNamespace>
<attach/>
</anyNamespace>
</mode>
</rules>
There is a built-in mode named #attach, which contains
just the rule:
<anyNamespace> <attach/> </anyNamespace>
Thus, the last example in the previous section can be simplified to:
<rules startMode="soap"
xmlns="http://www.thaiopensource.com/validate/nrl">
<mode name="soap">
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
<validate schema="soap-envelope.xsd"
useMode="xhtml"/>
</namespace>
</mode>
<mode name="xhtml">
<namespace ns="http://www.w3.org/1999/xhtml">
<validate schema="xhtml+svg.rng"
useMode="#attach"/>
</namespace>
</mode>
</rules>
Suppose you are not interested in the namespace-sectioning capabilities of NRL, but you just want to validate a document concurrently against two schemas. The simplest way is like this:
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
<anyNamespace>
<validate schema="xhtml.rng"
useMode="#attach"/>
<validate schema="xhtml.sch"
useMode="#attach"/>
</anyNamespace>
</rules>
The useMode="#attach" ensures that the document will
be validated as is, rather than divided into sections.
Similarly, there is a built-in mode named #reject,
which contains just the rule:
<anyNamespace> <reject/> </anyNamespace>
and a built-in mode named #allow, which contains just
the rule:
<anyNamespace> <allow/> </anyNamespace>
Up to now, sections validated by one subschema have not
participated in the validation of parent sections. Modern schema
languages, such as W3C XML Schema and RELAX NG, can use wildcards to
allow elements and attributes from any namespace in particular
contexts. It is useful to take advantage of this in order to allow
one subschema to constrain the contexts in which sections validated by
other subschemas can occur. For example, the official schema for
http://schemas.xmlsoap.org/soap/envelope/ uses wildcards
to specify precisely where elements from other namespaces are allowed:
they are allowed as children of the env:Body and
env:Header elements but not as children of the
env:Envelope element. Our NRL schema bypasses these
constraints because the XHTML sections are not seen by the SOAP
validation. We can use attach to solve this problem:
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
<validate schema="soap-envelope.xsd"/>
</namespace>
<namespace ns="http://www.w3.org/1999/xhtml">
<validate schema="xhtml.rng"/>
<attach/>
</namespace>
</rules>
When an XHTML section occurs inside a SOAP section, the XHTML section will participate in two validations:
So far we have seen how to make the processing of an element depend
on the namespace URIs of its ancestors. NRL also allows the
processing to depend on the element names of its ancestors. For
example, suppose we wish to allow RDF to occur only as a child of the
head element of XHTML. We can do this as follows:
<rules startMode="root"
xmlns="http://www.thaiopensource.com/validate/nrl">
<mode name="root">
<namespace ns="http://www.w3.org/1999/xhtml">
<validate schema="xhtml.rng">
<context path="head"
useMode="rdf"/>
</validate>
</namespace>
</mode>
<mode name="rdf">
<namespace ns="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<validate schema="rdfxml.rng"
useMode="#attach"/>
</namespace>
</mode>
</rules>
Any element that takes a useMode attribute can also
have one or more context children that override the
useMode attribute in specific contexts. The
path attribute specifies a test to be applied to the
parent element of the element or attribute section to be processed.
The path attribute allows a restricted form of XPath: a
list of one or more choices separated by |, where each
choice is a list of one or more unqualified names separated by
/, optionally preceded by /. It is
interpreted like a pattern in XSLT, except that the names are
implicitly qualified with the namespace URI of the containing
namespace element. When more than one path matches, the
most specific is chosen. It is an error to have two or more equally
specific paths. The path is tested against a single section not the
entire document: a path of /foo means a foo
element that is the root of a section; it does not mean a
foo element that is the root of the document.
Up to now, we have considered attributes to be inseparably attached to their parent elements. Although this is the default behaviour is to attach attributes to their parent elements, attributes are in fact considered to be separate sections and can be processed separately. Attributes with the same namespace URI and same parent element are grouped in a single section.
A namespace or anyNamespace element can
have a match attribute, whose value must be a list of one
or two of the tokens attributes and
elements. If the value includes the token
attributes, the rule matches attribute sections.
The default behaviours of attaching attributes to their parent
elements occurs because the default value of the match
attribute is elements and because all of the built-in
modes include a rule:
<anyNamespace match="attributes"> <attach/> </anyNamespace>
Most, if not all, XML schema languages do not have any notion of
validating a set of attributes; they know only how to validate an XML
element. Therefore, before validating an attribute section, NRL
transforms it into an XML element by creating a dummy element to hold
the attributes. NRL also performs a corresponding transformation on
the schema. This is schema-language dependent. For example, in the
case of RELAX NG, a schema s is transformed to
<element><anyName/> s </element>.
For example, suppose xmlatts.rng contains a schema for
the attributes in the xml namespace written in RELAX
NG:
<group datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
xmlns="http://relaxng.org/ns/structure/1.0">
<optional>
<attribute name="xml:lang">
<choice>
<data type="language"/>
<value/>
</choice>
</attribute>
</optional>
<optional>
<attribute name="xml:base">
<data type="anyURI"/>
</attribute>
</optional>
<optional>
<attribute name="xml:space">
<choice>
<value>preserve</value>
<value>default</value>
</choice>
</attribute>
</optional>
</group>
An NRL schema could use this as follows:
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
<namespace ns="http://www.w3.org/1999/xhtml">
<validate schema="xhtml.rng"/>
</namespace>
<namespace ns="http://www.w3.org/XML/1998/namespace"
match="attributes">
<validate schema="xmlatts.rng"/>
</namespace>
</rules>
One mode can extend another mode. Suppose in our SOAP+XHTML example, we want to allow both SOAP element and XHTML elements to contain RDF. By putting the rule for RDF in its own mode and extending that mode, we can avoid having to specify the rule for RDF twice:
<rules startMode="soap"
xmlns="http://www.thaiopensource.com/validate/nrl">
<mode name="common">
<namespace ns="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<validate schema="rdfxml.rng"
useMode="#attach"/>
</namespace>
</mode>
<mode name="soap"
extends="common">
<namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
<validate schema="soap-envelope.xsd"
useMode="body"/>
</namespace>
</mode>
<mode name="body"
extends="common">
<namespace ns="http://www.w3.org/1999/xhtml">
<validate schema="xhtml.rng"/>
</namespace>
</mode>
</rules>
It is possible to extend a built-in mode. Thus, a mode that
validates laxly can be specified simply just by extending
#allow. This works because of how wildcards and
inheritance interact. Suppose mode x extends mode
y; then when using mode x, the following order
will be used to search for a matching rule:
The requirement that there is an implicit rule of
<anyNamespace> <reject/> </anyNamespace>
can be restated as a requirement that the default value of the
extends attribute is #reject.
Many schema languages can deal with the kind of extensibility that
involves adding child elements or attributes from different
namespaces. A more difficult kind of extensibility is where we need
to be able to wrap an extension element around an existing
non-extension element. This can arise with namespaces describing
templating and versioning. Imagine XHTML inside an XSLT stylesheet:
in such a document we might have a ul element containing
an xsl:for-each element containing an li
element, although the schema for XHTML requires li
elements to occur as direct children of ul elements. In
such a situation, we need to need to make the XHTML schema
unwrap the xsl:for-each element, ignoring its
start-tag and end-tag, but not ignoring its content.
Suppose we have a namespace
http://www.example.org/edit containing elements
inserted and deleted, which describe edits
that have been made to a document, and suppose we want to use these
elements inside an XHTML document. The following NRL schema would
allow us still to validate the XHTML document.
<rules startMode="root"
xmlns="http://www.thaiopensource.com/validate/nrl">
<mode name="root">
<namespace ns="http://www.w3.org/1999/xhtml">
<validate schema="xhtml.rng"
useMode="xhtml"/>
</namespace>
</mode>
<mode name="xhtml">
<namespace ns="http://www.example.org/edit">
<unwrap/>
</namespace>
<namespace ns="http://www.w3.org/1999/xhtml">
<attach/>
</namespace>
</mode>
</rules>
When unwrap is applied to an element section
e, it ignores the elements in e and their
attributes and just processes the child element sections of
e; if processing the child element sections causes a
section to try to attach to e, it will instead attach to
the parent of e. Thus, in the above schema the section
from the edit namespace will be ignored, but child sections will be
processed according to rules applicable in the xhtml
mode. When a edit section has an XHTML child section, then that XHTML
child section will be attached to the parent of the edit section
(which can only be another XHTML section).
The above schema does not deal with validating the edit
namespace. Let us suppose that inserted and
deleted elements cannot nest. Our schema
edit.rnc for the edit namespace is just two lines:
default namespace = "http://www.example.org/edit"
element inserted|deleted { empty }
The following NRL schema would allow validation of the edit namespace:
<rules startMode="root"
xmlns="http://www.thaiopensource.com/validate/nrl">
<mode name="root">
<namespace ns="http://www.w3.org/1999/xhtml">
<validate schema="xhtml.rng"
useMode="xhtml"/>
</namespace>
</mode>
<mode name="xhtml"
extends="noEdit">
<namespace ns="http://www.example.org/edit">
<validate schema="edit.rnc"
schemaType="application/x-rnc"
useMode="#allow"/>
<unwrap useMode="noEdit"/>
</namespace>
</mode>
<mode name="noEdit">
<namespace ns="http://www.w3.org/1999/xhtml">
<attach/>
</namespace>
</mode>
</rules>
The above schema is still not quite right. Suppose a
title element was both inserted and deleted. With the
above NRL schema, XHTML validation would see two title
elements, which would get an error. We should instead do XHTML
validation twice, once including the content of the
inserted elements and ignoring the content of the
deleted elements and once doing the opposite. We only
need to validate the edit elements once. The following NRL schema
accomplishes this:
<rules startMode="root"
xmlns="http://www.thaiopensource.com/validate/nrl">
<mode name="root">
<namespace ns="http://www.w3.org/1999/xhtml">
<validate schema="xhtml.rng"
useMode="new"/>
<validate schema="xhtml.rng"
useMode="old"/>
</namespace>
</mode>
<mode name="new"
extends="noEdit">
<namespace ns="http://www.example.org/edit">
<validate schema="edit.rnc"
schemaType="application/x-rnc"
useMode="#allow"/>
<unwrap useMode="noEdit">
<context path="deleted"
useMode="#allow"/>
</unwrap>
</namespace>
</mode>
<mode name="old"
extends="noEdit">
<namespace ns="http://www.example.org/edit">
<unwrap useMode="noEdit">
<context path="inserted"
useMode="#allow"/>
</unwrap>
</namespace>
</mode>
<mode name="noEdit">
<namespace ns="http://www.w3.org/1999/xhtml">
<attach/>
</namespace>
</mode>
</rules>
The fundamental idea of dividing into sections, each of which contains elements from a single namespace, and then validating each section separately against the schema for its namespace originated in Murata Makoto's RELAX Namespace, which formed the basis for the recently published DSDL Part 4 CD.
RELAX Namespace was designed to work well with RELAX Core. RELAX Core cannot deal with documents that use multiple namespaces, nor does it provide any namespace-based wildcards. These limitations of RELAX Core are reflected in the design of RELAX Namespace. NRL is designed to be able to take advantage of more recent schema languages, such as RELAX NG, that are not limited in this way.
In response to MNS, Rick Jelliffe produced the Namespace Switchboard, which inspired much of the evolution of NRL from MNS.
Thanks to Murata Makoto and Rick Jelliffe for helpful comments.
Committee Draft of Document Schema Definition Languages (DSDL) -- Part 4: Selection of Validation Candidates, http://www.y12.doe.gov/sgml/sc34/document/0363.htm
Jing, http://www.thaiopensource.com/relaxng/jing.html
Modular Namespaces (MNS), http://www.thaiopensource.com/relaxng/mns.html
Namespace Switchboard, http://www.topologi.com/resources/NamespaceSwitchboard.html
RELAX Core, http://www.xml.gr.jp/relax/
RELAX Namespace, http://www.y-adagio.com/public/standards/tr_relax_ns/toc.htm
RELAX NG Compact Syntax, http://www.oasis-open.org/committees/relax-ng/compact-20021121.html
RELAX NG DTD Compatibility, http://www.oasis-open.org/committees/relax-ng/compatibility-20011203.html
RELAX NG, http://relaxng.org
Schematron, http://www.ascc.net/xml/resource/schematron/schematron.html
W3C XML Schema, http://www.w3.org/TR/xmlschema-1/
NRL elements can be extended with arbitrary attributes provided the attributes are namespace qualified and their namespace is not the NRL namespace; they can also be extended with arbitrary child elements with any namespace (including the absent namespace) other than the NRL namespace. We could provide a RELAX NG schema that fully described NRL, but the extensibility would make the schema harder to understand. So instead we provide a RELAX NG schema (in compact syntax) that does not allow extensibility, and provide an NRL schema to make it extensible.
Thus, NRL is described by the following NRL schema:
<rules startMode="root"
xmlns="http://www.thaiopensource.com/validate/nrl">
<mode name="root">
<namespace ns="http://www.thaiopensource.com/validate/nrl">
<validate schema="nrl.rnc"
schemaType="application/x-rnc"
useMode="extend"/>
</namespace>
</mode>
<mode name="extend">
<namespace ns="http://www.thaiopensource.com/validate/nrl"
match="attributes">
<reject/>
</namespace>
<namespace ns=""
match="attributes">
<attach/>
</namespace>
<anyNamespace match="elements attributes">
<allow useMode="#attach"/>
</anyNamespace>
</mode>
</rules>
where nrl.rnc is as follows:
default namespace = "http://www.thaiopensource.com/validate/nrl"
start =
element rules {
schemaType?,
(rule* | (attribute startMode { modeName }, mode+))
}
mode =
element mode {
attribute name { userModeName },
attribute extends { modeName }?,
rule*
}
rule =
element namespace {
attribute ns { xsd:anyURI },
ruleModel
}
| element anyNamespace { ruleModel }
ruleModel = attribute match { elementsOrAttributes }?, actions
elementsOrAttributes =
list {
("elements", "attributes")
| ("attributes", "elements")
| "elements"
| "attributes"
}
actions =
noResultAction*, (noResultAction|resultAction), noResultAction*
noResultAction =
element validate {
attribute schema { xsd:anyURI },
schemaType?,
option*,
modeUsage
}
| element allow|reject { modeUsage }
resultAction =
element attach|unwrap { modeUsage }
option =
element option {
attribute name { xsd:anyURI },
attribute arg { text }?,
attribute mustSupport { xsd:boolean }?
}
modeUsage =
attribute useMode { modeName }?,
element context {
attribute path { path },
attribute useMode { modeName }?
}*
modeName = userModeName | builtinModeName
userModeName = xsd:NCName
builtinModeName = "#attach" | "#allow" | "#reject" | "#unwrap"
schemaType = attribute schemaType { mediaType }
mediaType = xsd:string # should do better than this
path =
xsd:string {
pattern = "\s*(/\s*)?\i\c*(\s*/\s*\i\c*)*\s*"
~ "(|\s*(/\s*)?\i\c*(\s*/\s*\i\c*)*\s*)*"
}
In order to describe the semantics of NRL, it is convenient to construct a new section-based data model. This data model is constructed from the RELAX NG data model. An implementation wouldn't actually have to construct this, but the semantics are simpler to describe in terms of this data model rather than in terms of the RELAX NG data model. Note that the information content is exactly equivalent to the RELAX NG data model.
There are two kinds of section: attribute sections and element sections. Two attributes belong to the same section iff they have the same parent and the same namespace URI. An element belongs to the same section as its parent iff it has the same namespace URI as its parent. An attribute section is simply a non-empty unordered set of attributes (as in RELAX NG), where each member of the set has the same namespace URI. An element section is a little more complicated. First we need the concept of a node. There are three kinds of node: an element node, a text node and a slot node. An element node has a name, a context (as in RELAX NG), and a list of child nodes. A text node has a string value. A slot node has no additional information; it is merely a placeholder for a element section. A list of child nodes never has two adjacent text nodes and never has two adjacent slot nodes. An element section is a triple <nd, lsa, lle>, where nd is an element node, lsa is a list of unordered sets of attribute sections, and lle is a list of lists of element sections. lsa has one member for each element node in nd. The unordered set of attribute sections that is the n-th member of lsa gives the attributes for the n-th element node in nd (iterating in document order). lle has one member for each slot node in nd. The list of element sections that is the n-th member of lle corresponds to the n-th slot node in nd (iterating in document order).
An NRL schema consists of a set of modes. A mode consists of a set of rules. A mode maps a section to an action based on the section's namespace URI and on whether the section is an attribute section or an element section. An action can be applied to element sections and attribute sections. An action returns two values, one of which is always error information. When an action is applied to an element section, it returns error information and a (possibly empty) list of element sections. When an action is applied to an attribute section, it returns error information and either an attribute section or an empty list.
In the NRL syntax, a rule can contain multiple actions. This is represented in the formalization using a Sequence action. The sequence action discards results (other than error information) from the first action. Only two actions can produce results other than error information: attach and unwrap. The NRL syntax allows at most one such action in a rule. When constructing a sequence representing a set of actions in a rule, this action, if any, must be the last action in the sequence.
Here is a formalization in Haskell:
type Uri = String
type LocalName = String
type QName = (Uri, LocalName)
type Prefix = String
type Context = (Uri, [(Prefix, Uri)])
data Node = ElementNode QName Context [Node]
| TextNode String
| SlotNode
type AttributeSection = [(QName, String)]
data ElementSection = ElementSection Node [[AttributeSection]] [[ElementSection]]
data ElementsOrAttributes = Elements | Attributes
type Mode = ElementsOrAttributes -> Uri -> Action
data Action = Attach Mode
| Reject Mode
| Unwrap Mode
| Allow Mode
| Validate Uri Mode
| Sequence Action Action
data ErrorReport = AttributeError AttributeSection String
| ElementError ElementSection String
type ErrorInfo = [ErrorReport]
data Validated a = Validated ErrorInfo a
applyElementAction :: Action -> ElementSection -> Validated [ElementSection]
applyElementAction (Reject m) e@(ElementSection nd lsa lle) =
Validated ([ElementError e "namespace rejected"]
++ (errors (plsa m lsa))
++ (errors (plle m lle)))
[]
applyElementAction (Attach m) (ElementSection nd lsa lle)
= listV (elementSectionV nd (plsa m lsa) (plle m lle))
applyElementAction (Unwrap m) (ElementSection _ _ lle) = ple m (concat lle)
applyElementAction (Allow m) (ElementSection nd lsa lle)
= valid2 (\x y -> []) (plsa m lsa) (plle m lle)
applyElementAction (Validate s m) (ElementSection nd lsa lle)
= Validated (validate s (elementSectionV nd (plsa m lsa) (plle m lle)))
[]
applyElementAction (Sequence a1 a2) e
= actionSequence (applyElementAction a1 e) (applyElementAction a2 e)
validate :: Uri -> Validated ElementSection -> ErrorInfo
validate uri (Validated errs e) = errs ++ (validateElement uri e)
elementSectionV :: Node -> Validated [[AttributeSection]] -> Validated [[ElementSection]] -> Validated ElementSection
elementSectionV nd lsa lle = valid2 (ElementSection nd) lsa lle
applyAttributeAction :: Action -> AttributeSection -> Validated (Maybe AttributeSection)
applyAttributeAction (Allow m) a = Validated [] Nothing
applyAttributeAction (Reject m) a = Validated [AttributeError a "namespace rejected"] Nothing
applyAttributeAction (Attach m) a = Validated [] (Just a)
applyAttributeAction (Unwrap _) _ = Validated [] Nothing
applyAttributeAction (Validate s m) a
= Validated (validateAttribute s a) Nothing
applyAttributeAction (Sequence a1 a2) a
= actionSequence (applyAttributeAction a1 a) (applyAttributeAction a2 a)
actionSequence :: Validated a -> Validated a -> Validated a
actionSequence (Validated errs1 _) (Validated errs2 x) = Validated (errs1 ++ errs2) x
-- these are provided by an external validation library
validateElement :: Uri -> ElementSection -> ErrorInfo
validateElement _ _ = []
validateAttribute :: Uri -> AttributeSection -> ErrorInfo
validateAttribute _ _ = []
-- processing functions
pe :: Mode -> ElementSection -> Validated [ElementSection]
pe m e = applyElementAction (m Elements (elementSectionNs e)) e
ple :: Mode -> [ElementSection] -> Validated [ElementSection]
ple m le = concatMapV (pe m) le
plle :: Mode -> [[ElementSection]] -> Validated [[ElementSection]]
plle m lle = mapV (ple m) lle
pa :: Mode -> AttributeSection -> Validated (Maybe AttributeSection)
pa m a = applyAttributeAction (m Attributes (attributeSectionNs a)) a
psa :: Mode -> [AttributeSection] -> Validated [AttributeSection]
psa m sa = dropMapV (pa m) sa
plsa :: Mode -> [[AttributeSection]] -> Validated [[AttributeSection]]
plsa m lsa = mapV (psa m) lsa
elementSectionNs :: ElementSection -> Uri
elementSectionNs (ElementSection (ElementNode (ns, _) _ _) _ _) = ns
attributeSectionNs :: AttributeSection -> Uri
attributeSectionNs (((ns, _),_):_) = ns
-- functions for the Validated type
errors :: Validated a -> ErrorInfo
errors (Validated e _) = e
valid1 :: (a -> b) -> Validated a -> Validated b
valid1 f (Validated e x) = Validated e (f x)
valid2 :: (a -> b -> c) -> Validated a -> Validated b -> Validated c
valid2 f (Validated e x) (Validated e' y) = Validated (e ++ e') (f x y)
listV :: Validated a -> Validated [a]
listV x = valid1 (\y -> [y]) x
mapV :: (a -> Validated b) -> [a] -> Validated [b]
mapV f [] = Validated [] []
mapV f (x:xs) = valid2 (\ x xs -> (x:xs)) (f x) (mapV f xs)
concatMapV :: (a -> Validated [b]) -> [a] -> Validated [b]
concatMapV f xs = valid1 concat (mapV f xs)
dropMapV :: (a -> Validated (Maybe b)) -> [a] -> Validated [b]
dropMapV f [] = Validated [] []
dropMapV f (x:xs) = valid2 maybeCons (f x) (dropMapV f xs)
maybeCons :: (Maybe a) -> [a] -> [a]
maybeCons Nothing x = x
maybeCons (Just x) xs = (x:xs)
This does not yet deal with element-name context. To deal with this, we would need to change each of the Actions that has a Mode parameter to take a more complex structure.
-- DSDL members discussion list To unsubscribe, please send a message with the command "unsubscribe" to dsdl-discuss-request@dsdl.org (mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)Received on Tue Jun 10 05:38:33 2003
This archive was generated by hypermail 2.1.8 : Fri Dec 03 2004 - 14:00:27 UTC