I've developed a (moderately) formal framework for describing the
semantics of namespace processing. I've found this helpful in thinking
about how to evolve MNS. Attached is a document describing this
framework and sketch of my current thinking on MNS2 using this framework
including some example MNS2 schemas. It incorporates several ideas from
Namespace Switchboard.
James
This document provides a formal framework for describing the semantics of namespace processing such as that done by VCSL, Namespace Switchboard and MNS. It is independent of the concrete syntax chosen for the language.
In order to describe the semantics of namespace processing it is convenient to construct a new data model tailored to namespace-based processing. This data model is constructed from the RELAX NG data model. [An implementation wouldn't actually have to construct this. But the semantics are simpler to describe in terms of this new data model than in terms of the RELAX NG data model.] Note that the information content is exactly equivalent to the RELAX NG data model.
This data model uses the idea of a section. [This is similar to the concept of island in RELAX Namespace. I have avoided the word island because the metaphor of islands mixes poorly with the metaphor of trees, and because it applies a degree of isolation that is not present: the concept of coverage implies that a single validation subject may be formed from multiple sections.] There are two kinds of section: attribute sections and element sections. Two attributes belong to the same section iff they have the same parent and the same namespace URI. An element belongs to the same section as its parent iff it has the same namespace URI as its parent. An attribute section is simply a non-empty unordered set of attributes (as in RELAX NG), where each member of the set has the same namespace URI. An element section is a little more complicated. First we need the concept of a node. There are three kinds of node: an element node, a text node and a slot node. An element node has a name, a context (as in RELAX NG), and a list of child nodes. A text node has a string value. A slot node has no additional information; it is merely a placeholder for a element section. A list of child nodes never has two adjacent text nodes and never has two adjacent slot nodes. An element section is a triple <nd, lsa, lle>, where nd is an element node, lsa is a list of unordered sets of attribute sections, and lle is a list of lists of element sections. lsa has one member for each element node in nd. The unordered set of attribute sections that is the n-th member of lsa gives the attributes for the n-th element node in nd (iterating in document order). lle has one member for each slot node in nd. The list of element sections that is the n-th member of lle corresponds to the n-th slot node in nd (iterating in document order).
An element section in this data model exactly corresponds to an element in the RELAX NG data model. An element section can be constructed from an element and vice-versa.
An namespace processing schema consists of a set of rule-sets. [How rule-sets are identified is a separate issue: they might be handled as named modes or they might be handled via nesting.] A rule-set consists of a set of rules. A rule-set maps a section to an action based on the section's namespace URI and on whether the section is an attribute section or an element section. An action can be applied to element sections and attribute sections. When an action is applied to an element section, it returns either an error or a (possibly empty) list of element sections. When an action is applied to an attribute section, it returns either an error or an attribute section or an empty list. The fundamental operation, which I will call PE(r,e),
An auxiliary operation, which I will call PA(r, a):
Using PE(r, e), we can define a function PLE(r, le) that:
Using PA(r, a), we can define a function PSA(r, sa) that:
Using PLE(r, le), we can define a function PLLE(r, lle) that:
Using PSA(r, sa), we can define a function PLSA(r, lsa) that:
Now we can discuss the semantics of actions. There are many kinds of action that can be given a coherent semantics using the above semantic framework. Here are some of the candidates.
Note that:
Here is a formalization of the above in Haskell.
type Uri = String
type LocalName = String
type QName = (Uri, LocalName)
type Prefix = String
type Context = (Uri, [(Prefix, Uri)])
data Node = ElementNode QName Context [Node]
| TextNode String
| HoleNode
type AttributeSection = [(QName, String)]
data ElementSection = ElementSection Node [[AttributeSection]] [[ElementSection]]
data ElementsOrAttributes = Elements | Attributes
type RuleSet = ElementsOrAttributes -> Uri -> Action
data Action = Pass RuleSet
| Reject
| Delve RuleSet
| Empty
| Validate Uri RuleSet
| Sequence Action Action
data Validated a = Valid a | Invalid
applyElementAction :: Action -> ElementSection -> Validated [ElementSection]
applyElementAction Empty e = Valid []
applyElementAction Reject e = Invalid
applyElementAction (Pass r) (ElementSection nd lsa lle)
= listV (elementSectionV nd (plsa r lsa) (plle r lle))
applyElementAction (Delve r) (ElementSection _ _ lle) = ple r (concat lle)
applyElementAction (Validate s r) (ElementSection nd lsa lle)
= let e = (elementSectionV nd (plsa r lsa) (plle r lle))
in if elementIsValidV s e then (listV e) else Invalid
elementSectionV :: Node -> Validated [[AttributeSection]] -> Validated [[ElementSection]] -> Validated ElementSection
elementSectionV nd lsa lle = valid2 (ElementSection nd) lsa lle
elementIsValidV :: Uri -> Validated ElementSection -> Bool
elementIsValidV uri e = valid False (elementIsValid uri) e
applyAttributeAction :: Action -> AttributeSection -> Validated (Maybe AttributeSection)
applyAttributeAction Empty e = Valid Nothing
applyAttributeAction Reject e = Invalid
applyAttributeAction (Pass r) a = Valid (Just a)
applyAttributeAction (Delve _) _ = Valid Nothing
applyAttributeAction (Validate s r) a
= if attributeIsValid s a then Valid (Just a) else Invalid
-- these are provided by an external validation library
elementIsValid :: Uri -> ElementSection -> Bool
elementIsValid _ _ = True
attributeIsValid :: Uri -> AttributeSection -> Bool
attributeIsValid _ _ = True
-- processing functions
pe :: RuleSet -> ElementSection -> Validated [ElementSection]
pe r e = applyElementAction (r Elements (elementSectionNs e)) e
ple :: RuleSet -> [ElementSection] -> Validated [ElementSection]
ple r le = concatMapV (pe r) le
plle :: RuleSet -> [[ElementSection]] -> Validated [[ElementSection]]
plle r lle = mapV (ple r) lle
pa :: RuleSet -> AttributeSection -> Validated (Maybe AttributeSection)
pa r a = applyAttributeAction (r Attributes (attributeSectionNs a)) a
psa :: RuleSet -> [AttributeSection] -> Validated [AttributeSection]
psa r sa = dropMapV (pa r) sa
plsa :: RuleSet -> [[AttributeSection]] -> Validated [[AttributeSection]]
plsa r lsa = mapV (psa r) lsa
elementSectionNs :: ElementSection -> Uri
elementSectionNs (ElementSection (ElementNode (ns, _) _ _) _ _) = ns
attributeSectionNs :: AttributeSection -> Uri
attributeSectionNs (((ns, _),_):_) = ns
-- functions for the Validated type
valid :: b -> (a -> b) -> Validated a -> b
valid x f Invalid = x
valid x f (Valid y) = f y
valid1 :: (a -> b) -> Validated a -> Validated b
valid1 f x = valid Invalid (Valid . f) x
valid2 :: (a -> b -> c) -> Validated a -> Validated b -> Validated c
valid2 f x y = valid Invalid (\ z -> valid1 (f z) y) x
listV :: Validated a -> Validated [a]
listV x = valid1 (\y -> [y]) x
mapV :: (a -> Validated b) -> [a] -> Validated [b]
mapV f [] = Valid []
mapV f (x:xs) = valid2 (\ x xs -> (x:xs)) (f x) (mapV f xs)
concatMapV :: (a -> Validated [b]) -> [a] -> Validated [b]
concatMapV f xs = valid1 concat (mapV f xs)
dropMapV :: (a -> Validated (Maybe b)) -> [a] -> Validated [b]
dropMapV f [] = Valid []
dropMapV f (x:xs) = valid2 maybeCons (f x) (dropMapV f xs)
maybeCons :: (Maybe a) -> [a] -> [a]
maybeCons Nothing x = x
maybeCons (Just x) xs = (x:xs)
Some additional features are possible.
This could be handled by adding an optional parameter ns of type string to Pass and Validate. Instead of constructing <nd, PLSA(r, lsa), PLLE(r, lle)>, they would construct <R(nd,ns), PLSA(r, lsa), PLLE(r, lle)>, where R(nd,ns) changes the namespace URI of each element node in nd to ns [What would this do to the context of element nodes?]. Similarily, it would change the namespace URI in attribute sections.
Each of the Actions that has a rule-set parameter would instead have a more complex structure, which I will call a contextual rule map. A contextual rule map provides two operations, which I will call AR and ER:
PLLE becomes PLLE(lr, lle) where lr is a list of rule-sets. The n-th member of lr is used for the n-th member of lle. Similarly for PLSA. Instead of <nd, PLSA(r, lsa), PLLE(r, lle)>, we would have <nd, PLSA(AR(crm, nd), lsa), PLLE(ER(crm, nd), lle)>.
Note that the contextual rule map does not use information outside the section, thus it need deal only with local names [but what about anyNamespace?]. It might be as simple as "use this mode when the parent element has local name x".
James ClarkThis document sketches a possible syntax for MNS2, having the same flavour as MNS, but using my proposed Semantic Framework for Namespace Processing.
Here is one possible syntax:
default namespace = "http://www.thaiopensource.com/ns/mns2"
start =
element rules {
attribute startMode { mode }?,
schemaType?,
rule*
}
rule =
element namespace {
attribute ns { xsd:anyURI },
ruleModel
}
| element anyNamespace { ruleModel }
ruleModel =
attribute inModes { list { mode+ } }?,
attribute match { elementsOrAttributes }?,
action,
attribute prune { "true" }?
elementsOrAttributes =
list {
("elements", "attributes")
| ("attributes", "elements")
| "elements"
| "attributes"
}
action =
element validate {
attribute schema { xsd:anyURI },
schemaType?,
modeUsage
}
| element pass { modeUsage }
| element reject { empty }
| element delve { modeUsage }
| element sequence { action+ }
modeUsage =
attribute useMode { mode }?,
element parent {
attribute name { xsd:NCName },
attribute useMode { mode }
}*
mode = xsd:NCName | "#default"
schemaType = attribute schemaType { mediaType }
mediaType = xsd:string
prune="true" is equivalent to putting the action in
sequence followed by Empty. There is no separate action for Empty;
Empty can be expressed using delve, so this does not lose
functionality.
Rule-sets are described syntactically using modes as in MNS. Using nesting would also be possible as in Namespace Switchboard. However, when experimenting I found the nesting syntax to be hard to read and write for non-trivial examples.
A section only matches an anyNamespace rule if there is no matching namespace rule.
As in MNS, the default value for useMode and inModes is #default.
If there is no rule for an element section in a mode, then the default action is reject.
Attribute sections are an advanced feature that users shouldn't
have to worry about if they don't need them. So we make the default
value of the match attribute be elements,
and if there is no rule for an attribute section in a mode, then we
make the default action be pass. Thus, by default, attributes will be
handled with their parent elements.
This does not implement the Dummy operation, as I am not yet convinced it is useful.
This does not implement the CheckNames operation, as I am not yet convinced it is useful.
This doesn't yet provide for inline schemas. These would definitely be useful, but it's not yet clear to me how much implementation complexity they add.
Context is slightly simpler than MNS. We might want to use a
simple subset of XPath instead: <context path="html/head"
useMode="head">.
Here's the XHTML+RDF example from the original MNS spec in the above syntax.
<rules startMode="root" xmlns="http://www.thaiopensource.com/ns/mns2">
<namespace ns="http://www.w3.org/1999/xhtml" inModes="root">
<validate schema="xhtml.rng">
<parent name="head" useMode="rdf"/>
</validate>
</namespace>
<anyNamespace>
<pass/>
</anyNamespace>
<namespace ns="http://www.w3.org/1999/02/22-rdf-syntax-ns#" inModes="rdf"
prune="true">
<validate schema="rdfxml.rng"/>
</namespace>
</rules>
This uses a rather different strategy for dividing the document into validation subjects. The effect of
<anyNamespace>
<pass/>
</anyNamespace>
is that, by default, element sections from different namespaces are
combined into a single validation subject. We make a validation
subject from the complete document, and we carve out a validation
subject for each RDF element that is a child of head. We
don't concern ourselves with what namespaces might occur in the body
of the XHTML: we simply pass everything on to the XHTML schema. In
this case, the XHTML schema is closed, so the XHTML schema will
generate an error.
We can use this same technique to handle Ishikawa-san's hybrid example:
<!-- An experimental MNS schema for
XHTML2+MathML+SVG+EGIX+ContactXML+HLink+RDF+XMLCharEnt
$Id: mns2.html,v 1.1 2003/05/14 05:15:36 jjc Exp $
Hacked by jjc for MNS2
-->
<rules xmlns="http://www.thaiopensource.com/ns/mns2" startMode="xhtml">
<!-- allow XHTML's 'link', 'script' and 'style' elements -->
<namespace ns="http://www.w3.org/1999/xhtml"
inModes="stuff-in-head"
prune="true">
<validate schema="xhtml-head.rng"/>
</namespace>
<namespace ns="http://www.w3.org/2002/06/hlink"
inModes="stuff-in-head"
prune="true">
<validate schema="hlink.rng"/>
</namespace>
<namespace ns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
inModes="stuff-in-head"
prune="true">
<validate schema="rdfxml.rng"/>
</namespace>
<namespace ns="http://www.xmlns.org/2002/ContactXML"
inModes="address"
prune="true">
<validate schema="ContactXML_v1.1.rng"/>
</namespace>
<namespace ns="http://www.w3.org/2002/06/xhtml2"
inModes="xhtml">
<validate schema="xhtml2-math-svg-egix-xmlcharent.rng">
<parent name="head" useMode="stuff-in-head"/>
<parent name="address" useMode="address"/>
</validate>
</namespace>
<anyNamespace>
<pass/>
</anyNamespace>
</rules>
Suppose we are writing some documentation in XHTML and we want to
have separate Linux and Windows versions of the documentation. Let's
imagine that we have a v:linux element containing
Linux-only stuff and a v:windows element containing
Windows-only stuff. We will run some trivial XSLT on the XHTML to
generate the Linux and Windows version. We want to ensure that both
the Windows and Linux versions are valid XHTML. We also want to check
that the v:* elements aren't nested, since that wouldn't make sense.
We want to use the following one-liner as the schema for our v
namespace:
element linux|windows { empty }
Here's an MNS2 schema that will combine this schema with XHTML appropriately:
<rules startMode="root" xmlns="http://www.thaiopensource.com/ns/mns2">
<namespace ns="http://www.w3.org/1999/xhtml" inModes="root">
<sequence>
<validate schema="xhtml.rng" useMode="linux"/>
<validate schema="xhtml.rng" useMode="windows"/>
</sequence>
</namespace>
<namespace ns="http://www.example.com/version" inModes="linux">
<sequence>
<validate schema="version.rng" useMode="discard"/>
<delve useMode="discard">
<parent name="linux" useMode="keep"/>
</delve>
</sequence>
</namespace>
<namespace ns="http://www.example.com/version" inModes="windows">
<sequence>
<validate schema="version.rng" useMode="discard"/>
<delve useMode="discard">
<parent name="windows" useMode="keep"/>
</delve>
</sequence>
</namespace>
<anyNamespace inModes="keep">
<pass useMode="keep"/>
</anyNamespace>
<anyNamespace inModes="discard">
<delve useMode="discard"/>
</anyNamespace>
</rules>
James Clark
-- DSDL members discussion list To unsubscribe, please send a message with the command "unsubscribe" to dsdl-discuss-request@dsdl.org (mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)Received on Wed May 14 07:23:20 2003
This archive was generated by hypermail 2.1.8 : Fri Dec 03 2004 - 14:00:27 UTC