[dsdl-discuss] Semantic framework for namespace processing and MNS2

From: James Clark <jjc@jclark.com>
Date: Wed May 14 2003 - 05:22:40 UTC

I've developed a (moderately) formal framework for describing the
semantics of namespace processing. I've found this helpful in thinking
about how to evolve MNS. Attached is a document describing this
framework and sketch of my current thinking on MNS2 using this framework
including some example MNS2 schemas. It incorporates several ideas from
Namespace Switchboard.

James

Semantic Framework for Namespace Processing

Semantic Framework for Namespace Processing

This document provides a formal framework for describing the semantics of namespace processing such as that done by VCSL, Namespace Switchboard and MNS. It is independent of the concrete syntax chosen for the language.

Data model

In order to describe the semantics of namespace processing it is convenient to construct a new data model tailored to namespace-based processing. This data model is constructed from the RELAX NG data model. [An implementation wouldn't actually have to construct this. But the semantics are simpler to describe in terms of this new data model than in terms of the RELAX NG data model.] Note that the information content is exactly equivalent to the RELAX NG data model.

This data model uses the idea of a section. [This is similar to the concept of island in RELAX Namespace. I have avoided the word island because the metaphor of islands mixes poorly with the metaphor of trees, and because it applies a degree of isolation that is not present: the concept of coverage implies that a single validation subject may be formed from multiple sections.] There are two kinds of section: attribute sections and element sections. Two attributes belong to the same section iff they have the same parent and the same namespace URI. An element belongs to the same section as its parent iff it has the same namespace URI as its parent. An attribute section is simply a non-empty unordered set of attributes (as in RELAX NG), where each member of the set has the same namespace URI. An element section is a little more complicated. First we need the concept of a node. There are three kinds of node: an element node, a text node and a slot node. An element node has a name, a context (as in RELAX NG), and a list of child nodes. A text node has a string value. A slot node has no additional information; it is merely a placeholder for a element section. A list of child nodes never has two adjacent text nodes and never has two adjacent slot nodes. An element section is a triple <nd, lsa, lle>, where nd is an element node, lsa is a list of unordered sets of attribute sections, and lle is a list of lists of element sections. lsa has one member for each element node in nd. The unordered set of attribute sections that is the n-th member of lsa gives the attributes for the n-th element node in nd (iterating in document order). lle has one member for each slot node in nd. The list of element sections that is the n-th member of lle corresponds to the n-th slot node in nd (iterating in document order).

An element section in this data model exactly corresponds to an element in the RELAX NG data model. An element section can be constructed from an element and vice-versa.

Processing model

An namespace processing schema consists of a set of rule-sets. [How rule-sets are identified is a separate issue: they might be handled as named modes or they might be handled via nesting.] A rule-set consists of a set of rules. A rule-set maps a section to an action based on the section's namespace URI and on whether the section is an attribute section or an element section. An action can be applied to element sections and attribute sections. When an action is applied to an element section, it returns either an error or a (possibly empty) list of element sections. When an action is applied to an attribute section, it returns either an error or an attribute section or an empty list. The fundamental operation, which I will call PE(r,e),

  1. takes a rule-set r and an element section e
  2. finds the action for e in r
  3. applies the action e
  4. returns either an error or a list of element sections

An auxiliary operation, which I will call PA(r, a):

  1. takes a rule-set r and an attribute section a
  2. finds the action for a in r
  3. applies the action to a
  4. returns either an error or an attribute section or an empty list

Using PE(r, e), we can define a function PLE(r, le) that:

  1. takes a rule-set r and a list of element sections le
  2. applies PE(r, e) to each member e of le
  3. concatentates the results
  4. returns either an error or a list of element sections

Using PA(r, a), we can define a function PSA(r, sa) that:

  1. takes a rule-set r and a set of attribute sections sa
  2. applies PA(r, a) to each member a of sa
  3. discards empty lists
  4. returns either an error or a set of attribute sections

Using PLE(r, le), we can define a function PLLE(r, lle) that:

  1. takes a rule-set r and a list of lists of element sections lle
  2. applies PLE(r, le) to each member le of lle
  3. returns either an error or a list of lists of element sections

Using PSA(r, sa), we can define a function PLSA(r, lsa) that:

  1. takes a rule-set r and a list of sets of attribute sections lsa
  2. applies PSA(r, sa) to each member sa of lsa
  3. returns either an error or a list of sets of attribute sections

Possible actions

Now we can discuss the semantics of actions. There are many kinds of action that can be given a coherent semantics using the above semantic framework. Here are some of the candidates.

Reject
The Reject action has no parameters. When applied to any section, it returns an error.
Empty
The Empty action has no parameters. When applied to section, it returns an empty list.
Sequence(act1, act2)
A Sequence action has two parameters, which are both of type action. When applied to any section, it applies act1 to the section; if this returns an error, then the Sequence action returns an error; otherwise, it applies act2 to the section and returns the result. Note that the semantics here are purely functional: there is no requirement not to evaluate act2 until after act2 has been evaluated. Thus Sequence can be implemented in a streaming processor.
Pass(r)
The Pass action has a parameter r of type rule-set. When Pass(r) is applied to an element section <nd, lsa, lle>, it returns a one-member list containing <nd, PLSA(r, lsa), PLLE(r, lle)>. If the invocation of PLAA or PLLE returns an error, then Pass(r) returns an error. When Pass(r) is applied to an attribute section a, it returns a.
Validate(s, r)
The Validate action has a parameter s of type schema and a parameter r or type rule-set. When Validate(s, r) is applied to an element section <nd, lsa, lle>, it creates an element section e = <nd, PLSA(r, lsa), PLLE(r, lle)>. If this returns an error, then the Validate action returns an error. It then validates e using the schema s. [An element section exactly corresponds to a single XML element so it's the right kind of thing to validate with a schema.] If it is invalid, it returns an error. If it is valid, it returns a one-member list containing e. When Validate(s, r) is applied to an attribute section a, it validates a. If it is invalid, it returns an error. If it is valid, it returns a.
Delve(r)
The Delve action has a parameter r of type rule-set. When Delve is applied to an element section <nd, lsa, lle>, it concatenates the members of lle returning a list of element sections le, and then returns PLE(r, le). Applied to an attribute section, it returns the empty list.
Dummy(nm1, nm2)
The Dummy action has two parameters, nm1, nm2 of type name. When Dummy is applied to an element section with namespace uri ns, it returns an element section representing the XML element <nm1 nm2="ns"/>. Dummy cannot be applied to an attribute section.
CheckName(nl, a)
The CheckName action has a paramater nl that is a list of local names and a parameter a that is an action. When CheckName is applied to an element section, it checks that the local name of root element node of that section is in the list. If it is not, it returns an error. Otherwise, it applies a to the element section. When CheckName is applied to an attribute section, it checks whether the local names of each attribute is in the list. If any are not, it returns an error. Otherwise it applies a to the attribute section. [This might be useful with XSD, which doesn't allow you to control the root element.]

Note that:

Formalization

Here is a formalization of the above in Haskell.

type Uri = String
type LocalName = String
type QName = (Uri, LocalName)
type Prefix = String
type Context = (Uri, [(Prefix, Uri)])

data Node = ElementNode QName Context [Node]
	  | TextNode String
	  | HoleNode

type AttributeSection = [(QName, String)]

data ElementSection = ElementSection Node [[AttributeSection]] [[ElementSection]]
data ElementsOrAttributes = Elements | Attributes

type RuleSet = ElementsOrAttributes -> Uri -> Action

data Action = Pass RuleSet
	    | Reject
	    | Delve RuleSet
	    | Empty
	    | Validate Uri RuleSet
	    | Sequence Action Action

data Validated a = Valid a | Invalid

applyElementAction :: Action -> ElementSection -> Validated [ElementSection]

applyElementAction Empty e = Valid []
applyElementAction Reject e = Invalid
applyElementAction (Pass r) (ElementSection nd lsa lle)
  = listV (elementSectionV nd (plsa r lsa) (plle r lle))
applyElementAction (Delve r) (ElementSection _ _ lle) = ple r (concat lle)
applyElementAction (Validate s r) (ElementSection nd lsa lle)
  = let e = (elementSectionV nd (plsa r lsa) (plle r lle))
    in if elementIsValidV s e then (listV e) else Invalid

elementSectionV :: Node -> Validated [[AttributeSection]] -> Validated [[ElementSection]] -> Validated ElementSection
elementSectionV nd lsa lle = valid2 (ElementSection nd) lsa lle

elementIsValidV :: Uri -> Validated ElementSection -> Bool
elementIsValidV uri e = valid False (elementIsValid uri) e

applyAttributeAction :: Action -> AttributeSection -> Validated (Maybe AttributeSection)
applyAttributeAction Empty e = Valid Nothing
applyAttributeAction Reject e = Invalid
applyAttributeAction (Pass r) a = Valid (Just a)
applyAttributeAction (Delve _) _ = Valid Nothing
applyAttributeAction (Validate s r) a
  = if attributeIsValid s a then Valid (Just a) else Invalid

-- these are provided by an external validation library

elementIsValid :: Uri -> ElementSection -> Bool
elementIsValid _ _ = True

attributeIsValid :: Uri -> AttributeSection -> Bool
attributeIsValid _ _ = True

-- processing functions

pe :: RuleSet -> ElementSection -> Validated [ElementSection]
pe r e = applyElementAction (r Elements (elementSectionNs e)) e

ple :: RuleSet -> [ElementSection] -> Validated [ElementSection]
ple r le = concatMapV (pe r) le

plle :: RuleSet -> [[ElementSection]] -> Validated [[ElementSection]]
plle r lle = mapV (ple r) lle

pa :: RuleSet -> AttributeSection -> Validated (Maybe AttributeSection)
pa r a = applyAttributeAction (r Attributes (attributeSectionNs a)) a

psa :: RuleSet -> [AttributeSection] -> Validated [AttributeSection]
psa r sa = dropMapV (pa r) sa

plsa :: RuleSet -> [[AttributeSection]] -> Validated [[AttributeSection]]
plsa r lsa = mapV (psa r) lsa

elementSectionNs :: ElementSection -> Uri
elementSectionNs (ElementSection (ElementNode (ns, _) _ _) _ _) = ns

attributeSectionNs :: AttributeSection -> Uri
attributeSectionNs (((ns, _),_):_) = ns

-- functions for the Validated type

valid :: b -> (a -> b) -> Validated a -> b
valid x f Invalid = x
valid x f (Valid y) = f y

valid1 :: (a -> b) -> Validated a -> Validated b
valid1 f x = valid Invalid (Valid . f) x

valid2 :: (a -> b -> c) -> Validated a -> Validated b -> Validated c
valid2 f x y = valid Invalid (\ z -> valid1 (f z) y) x

listV :: Validated a -> Validated [a]
listV x = valid1 (\y -> [y]) x

mapV :: (a -> Validated b) -> [a] -> Validated [b]

mapV f [] = Valid []
mapV f (x:xs) = valid2 (\ x xs -> (x:xs)) (f x) (mapV f xs)

concatMapV :: (a -> Validated [b]) -> [a] -> Validated [b]
concatMapV f xs = valid1 concat (mapV f xs)

dropMapV :: (a -> Validated (Maybe b)) -> [a] -> Validated [b]
dropMapV f [] = Valid []
dropMapV f (x:xs) = valid2 maybeCons (f x) (dropMapV f xs)

maybeCons :: (Maybe a) -> [a] -> [a]
maybeCons Nothing x = x
maybeCons (Just x) xs = (x:xs)

Possible additional features

Some additional features are possible.

Namespace transformation

This could be handled by adding an optional parameter ns of type string to Pass and Validate. Instead of constructing <nd, PLSA(r, lsa), PLLE(r, lle)>, they would construct <R(nd,ns), PLSA(r, lsa), PLLE(r, lle)>, where R(nd,ns) changes the namespace URI of each element node in nd to ns [What would this do to the context of element nodes?]. Similarily, it would change the namespace URI in attribute sections.

Context

Each of the Actions that has a rule-set parameter would instead have a more complex structure, which I will call a contextual rule map. A contextual rule map provides two operations, which I will call AR and ER:

PLLE becomes PLLE(lr, lle) where lr is a list of rule-sets. The n-th member of lr is used for the n-th member of lle. Similarly for PLSA. Instead of <nd, PLSA(r, lsa), PLLE(r, lle)>, we would have <nd, PLSA(AR(crm, nd), lsa), PLLE(ER(crm, nd), lle)>.

Note that the contextual rule map does not use information outside the section, thus it need deal only with local names [but what about anyNamespace?]. It might be as simple as "use this mode when the parent element has local name x".

James Clark
MNS2 sketch

MNS2 sketch

This document sketches a possible syntax for MNS2, having the same flavour as MNS, but using my proposed Semantic Framework for Namespace Processing.

Syntax

Here is one possible syntax:

default namespace = "http://www.thaiopensource.com/ns/mns2"

start =
  element rules {
    attribute startMode { mode }?,
    schemaType?,
    rule*
  }

rule =
  element namespace {
    attribute ns { xsd:anyURI },
    ruleModel
  }
  | element anyNamespace { ruleModel }

ruleModel = 
   attribute inModes { list { mode+ } }?,
   attribute match { elementsOrAttributes }?,
   action,
   attribute prune { "true" }?

elementsOrAttributes =
  list {
    ("elements", "attributes") 
     | ("attributes", "elements") 
     | "elements"
     | "attributes"
  }

action =
  element validate {
    attribute schema { xsd:anyURI },
    schemaType?,
    modeUsage
  }
  | element pass { modeUsage }
  | element reject { empty }
  | element delve { modeUsage }
  | element sequence { action+ }

modeUsage =
  attribute useMode { mode }?,
  element parent {
    attribute name { xsd:NCName },
    attribute useMode { mode }
  }*

mode = xsd:NCName | "#default"
schemaType = attribute schemaType { mediaType }
mediaType = xsd:string

Semantics

prune="true" is equivalent to putting the action in sequence followed by Empty. There is no separate action for Empty; Empty can be expressed using delve, so this does not lose functionality.

Rule-sets are described syntactically using modes as in MNS. Using nesting would also be possible as in Namespace Switchboard. However, when experimenting I found the nesting syntax to be hard to read and write for non-trivial examples.

A section only matches an anyNamespace rule if there is no matching namespace rule.

As in MNS, the default value for useMode and inModes is #default.

If there is no rule for an element section in a mode, then the default action is reject.

Attribute sections are an advanced feature that users shouldn't have to worry about if they don't need them. So we make the default value of the match attribute be elements, and if there is no rule for an attribute section in a mode, then we make the default action be pass. Thus, by default, attributes will be handled with their parent elements.

This does not implement the Dummy operation, as I am not yet convinced it is useful.

This does not implement the CheckNames operation, as I am not yet convinced it is useful.

This doesn't yet provide for inline schemas. These would definitely be useful, but it's not yet clear to me how much implementation complexity they add.

Context is slightly simpler than MNS. We might want to use a simple subset of XPath instead: <context path="html/head" useMode="head">.

Examples

XHTML+RDF

Here's the XHTML+RDF example from the original MNS spec in the above syntax.

<rules startMode="root" xmlns="http://www.thaiopensource.com/ns/mns2">
  <namespace ns="http://www.w3.org/1999/xhtml" inModes="root">
    <validate schema="xhtml.rng">
      <parent name="head" useMode="rdf"/>
    </validate>
  </namespace>
  <anyNamespace>
    <pass/>
  </anyNamespace>
  <namespace ns="http://www.w3.org/1999/02/22-rdf-syntax-ns#" inModes="rdf"
             prune="true">
    <validate schema="rdfxml.rng"/>
  </namespace>
</rules>

This uses a rather different strategy for dividing the document into validation subjects. The effect of

  <anyNamespace>
    <pass/>
  </anyNamespace>

is that, by default, element sections from different namespaces are combined into a single validation subject. We make a validation subject from the complete document, and we carve out a validation subject for each RDF element that is a child of head. We don't concern ourselves with what namespaces might occur in the body of the XHTML: we simply pass everything on to the XHTML schema. In this case, the XHTML schema is closed, so the XHTML schema will generate an error.

Hybrid XHTML

We can use this same technique to handle Ishikawa-san's hybrid example:

<!-- An experimental MNS schema for
     XHTML2+MathML+SVG+EGIX+ContactXML+HLink+RDF+XMLCharEnt 
     $Id: mns2.html,v 1.1 2003/05/14 05:15:36 jjc Exp $
     Hacked by jjc for MNS2
-->
<rules xmlns="http://www.thaiopensource.com/ns/mns2" startMode="xhtml">
  <!-- allow XHTML's 'link', 'script' and 'style' elements -->
  <namespace ns="http://www.w3.org/1999/xhtml"
             inModes="stuff-in-head"
             prune="true">
    <validate schema="xhtml-head.rng"/>
  </namespace>
  <namespace ns="http://www.w3.org/2002/06/hlink"
             inModes="stuff-in-head"
             prune="true">
    <validate schema="hlink.rng"/>
  </namespace>
  <namespace ns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
             inModes="stuff-in-head"
             prune="true">
    <validate schema="rdfxml.rng"/>
  </namespace>
  <namespace ns="http://www.xmlns.org/2002/ContactXML"
             inModes="address"
             prune="true">
    <validate schema="ContactXML_v1.1.rng"/>
  </namespace>
  <namespace ns="http://www.w3.org/2002/06/xhtml2"
             inModes="xhtml">
    <validate schema="xhtml2-math-svg-egix-xmlcharent.rng">
      <parent name="head" useMode="stuff-in-head"/>
      <parent name="address" useMode="address"/>
    </validate>
  </namespace>
  <anyNamespace>
    <pass/>
  </anyNamespace>
</rules>

Delving

Suppose we are writing some documentation in XHTML and we want to have separate Linux and Windows versions of the documentation. Let's imagine that we have a v:linux element containing Linux-only stuff and a v:windows element containing Windows-only stuff. We will run some trivial XSLT on the XHTML to generate the Linux and Windows version. We want to ensure that both the Windows and Linux versions are valid XHTML. We also want to check that the v:* elements aren't nested, since that wouldn't make sense. We want to use the following one-liner as the schema for our v namespace:

element linux|windows { empty }

Here's an MNS2 schema that will combine this schema with XHTML appropriately:

<rules startMode="root" xmlns="http://www.thaiopensource.com/ns/mns2">
  <namespace ns="http://www.w3.org/1999/xhtml" inModes="root">
    <sequence>
      <validate schema="xhtml.rng" useMode="linux"/>
      <validate schema="xhtml.rng" useMode="windows"/>
    </sequence>
  </namespace>
  <namespace ns="http://www.example.com/version" inModes="linux">
    <sequence>
      <validate schema="version.rng" useMode="discard"/>
      <delve useMode="discard">
        <parent name="linux" useMode="keep"/>
      </delve>
    </sequence>
  </namespace>
  <namespace ns="http://www.example.com/version" inModes="windows">
    <sequence>
      <validate schema="version.rng" useMode="discard"/>
      <delve useMode="discard">
        <parent name="windows" useMode="keep"/>
      </delve>
    </sequence>
  </namespace>
  <anyNamespace inModes="keep">
    <pass useMode="keep"/>
  </anyNamespace>
  <anyNamespace inModes="discard">
    <delve useMode="discard"/>
  </anyNamespace>
</rules>
James Clark

--
DSDL members discussion list
To unsubscribe, please send a message with the
command  "unsubscribe" to dsdl-discuss-request@dsdl.org
(mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)
Received on Wed May 14 07:23:20 2003

This archive was generated by hypermail 2.1.8 : Fri Dec 03 2004 - 14:00:27 UTC