[dsdl-discuss] Re: Namespace processing

From: Rick Jelliffe <ricko@topologi.com>
Date: Sun Apr 27 2003 - 15:33:17 UTC

Murata-san asked for some more info on Namespace Switchboard.

Alias
------

WXS provides complex type extension and restriction, plus <redefine>
and <import> and <include>. Also it provides equivalence groups.
All of these are attempts to "reconstruct" uses of parameter entities
in DTDs. However, the WXS suffixation rule is a database-ism that
is intrusive and excessive in XML. <redefine> breaks type safety
in any case; indeed, the lack of a struct 1-1 relation between a
namespace and schema makes type safety fragile anyway.

So alias attempts to provide a simple way of coping, which would
be generally useful to any validator.

Preprocess and Traverse
-----------------------------

Preprocess and traverse are attempts to separate the tangle that is
"lax", "strict", "skip" etc. (I see now that "halt" does not belong:
it should be in its own attribute on <rules>.) I think it is useful to see that
@preprocess is an XML->XML transformation, while @traverse
is an instruction to the validator.

 (The HTML page used "delve" in the DTD but "enter" in the text,
for which I apologise.

<foo:a>
  <bar:b>
    <bar:c/>
    <foo:d/>
    <bar:e>
      <foo:f>
      </foo:f>
    </bar:e>
  </bar:b>
  <bar:f/>
</foo:a>

Removing "halt" to another attribute, there are four cases,
with the schema:

<rules xmlns="http://www.topologi.com/ns/ns" >
     <namespace ns="foo's namespace" >
         <schema uri="eg.rng" />
             <namespace ns="bar's namespace"
     preprocess="???"
      traversal="??" />
             </namespace>
          </schema>
     </namespace>
</rules>

preprocess=keep traverse=enter
======================

<foo:a>
  <bar:b>
    <bar:c/>
    <foo:d/>
    <bar:e>
      <foo:f>
      </foo:f>
    </bar:e>
  </bar:b>
  <bar:f/>
</foo:a>

Attempts to validate everything.

preprocess=keep traverse=skip
======================

<foo:a>
  <bar:b>
    <bar:c/>
    <foo:d/>
    <bar:e>
      <foo:f>
      </foo:f>
    </bar:e>
  </bar:b>
  <bar:f/>
</foo:a>

But does not attempt to validate under bar:b. So the foo:d and foo:f are
never validated.

preprocess=prune traverse=enter
======================

<foo:a>
    <foo:d/>
      <foo:f>
      </foo:f>
</foo:a>

preprocess=prune traverse=skip
=========================

<foo:a>
</foo:a>

From: "MURATA Makoto" <murata@hokkaido.email.ne.jp>
 
> DTDs, RDF Schema, and Topic Map Constraint Language can also take advantage
> of Part 4. Part 4 is not RELAX-specific.

I wonder whether Part 4 can really be general without some mechanism
to allow "feasible" validation of upper or middle islands. I see this in three
use cases:

1) XSLT generating XHTML. We want to validate the XHTML
elements. But these may include upper islands as well as branches,
and the use of <xslt:attribute> for example seems to require that
grammars can be validated against using a weaker strategy, such
as the "feasible" validation James put into Jing.

2) Any grammar that is written closed (i.e. for convenience) but
which we want to use anyway, but with our own elements
substituted. (i.e. a similar case to WXS' substitution groups
a.k.a. equivalence classes)

3) Any grammar that is written closed but we want to put in
some exception data. (I.e., similar to WXS' nilled elements)

> As for the ability to abort, I think that
> this is up to the error handler implementation.

I agree that abortion does not belong on a per-namespace basis.
If Schemamachine is followed for Part 1, then that allows one invalid
pass to stop validation before the next, which goes some way.

I guess this is a matter of expectation of what a schema language
does, as well. RELAX NG takes that approach that diagnostics
are an implementation issue; Schematron takes the approach that
messages and diagnostics are at the heart of the schema: how these
are presented are in implementation issue.

So a lot of the value of Schematron schemas is that they can
present diagnostics to the user in terms of the specific domain
rather than just in terms of the names of information items.
So issues such as "how do we prevent the user from being
swamped by spurious or redundant errors messages" are
important to Schematron. Phases provide one mechanism
to help perform "progressive validation" (i.e. where you first
validate some basic constraints, then some contingent ones,
and so on).

As I said before, because we seem to be approaching DSDL
bottom up (which I approve of) in defining discrete, simple,
excellent schema languages first, then the infrastructure
progressively, there is a strong likelihood that Part 4 will have
the things that RELAX NG needs in particular. The notion of
progressive validation, for example, is something that does
not make much sense in the grammar world.

> How about //foo:*[@id]? If you use Part 4, you can easily focus on
> the namespace for the prefix "foo". Then, Schematron becomes away
> more efficient.

If information items are stored by table, allowing a
 SELECT element WHERE namespace="foo" AND id IS NOT NULL
kind of lookup, then fragment size would not affect access, but point taken.

However, on a more significant level, I don't believe that people do
//foo:*[@id]. I think it is a mistake to think of IDs as keys and IDREFS as
foreign keys: they are metadata not data; they are shortcuts for creating
inverted indexes. And "ID"s are rarely scoped by namespace: instead they
need to be scoped by their originating document: if two documents are
merged, our eventual ID mechanism needs to be able to qualify IDs
and IDREFs with their originating scope: their namespace is that they
come from the same document not that they come from the same schema.

Cheers
Rick Jelliffe

--
DSDL members discussion list
To unsubscribe, please send a message with the
command  "unsubscribe" to dsdl-discuss-request@dsdl.org
(mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)
Received on Sun Apr 27 17:29:27 2003

This archive was generated by hypermail 2.1.8 : Fri Dec 03 2004 - 14:00:27 UTC