[dsdl-discuss] Rationale for Rick's draft Part 7 Character Repertoire Valdiation

From: Rick Jelliffe <ricko@allette.com.au>
Date: Thu Apr 08 2004 - 09:26:20 UTC

Eric van der Vlist wrote:

>Hi Rick,
>
>On Wed, 2004-04-07 at 10:56, Rick Jelliffe wrote:
>
>
>>Please find attached a complete XML draft of Part 7 Character Repertoire
>>Validation.
>>
>>It is a kind of Schematron, except that assertions tests expect
>>"Character Class" as used in XML Schemas, Perl, Java, etc.
>>
>>
>
>Very clever, this sounds like a neat idea.
>
>I am confused about the way to apply this to mixed content models,
>though.
>
>In that case (mixed content models), how would you read "The assertion
>test is interpreted according to production 13 of XML Schemas Datatypes"
>knowing that XML Schemas Datatypes only applies to attribute and simple
>contents?
>
In section 4:

 "Text nodes which are children of of the nodes that match the contexts
shall conform
to the repertoire."

For example, given the document,
    <x>aaaa<y>bbbb</y>c</x>
and the schema
  <rule context="x"><assert test="ac"/><assert test="^b"/></rule>
  <rule context="y"><assert test="b"/></rule>
  <rule context="*[ancestor-or-self::x]"><assert test="abc"/></rule>
then the document is valid against that schema.

In the first assertion, the subject node is x,
    its children text nodes are "aaaa" which tests OK and "c" which
tests OK.
In the second asssertion, the subject node is b, which tests OK.
In the third assertion, the subject nodes are x and y, the all text
nodes of each of
    them test OK.

I will put a note in to explicate this.

>The only instance of the word "mixed" in your proposal is in the use
>case "ensuring that a Dutch document contains characters only used in
>typical Dutch documents; the constraint applies to mixed content and
>element content;" and I think that the spec should clearly show how that
>can be done.
>
>
I put in the use-cases and examples specifically to show that mixed
content was intended
to be coped with.

>In fact, I think that it's the semantic of test="pattern" when the
>context node isn't a simple type element that should be clarified.
>
Yes. When a node has text nodes children (element, attribute, PI,
comment) then those
text nodes are tests. Else if the node is convertable to a string (in
particular, for names
of elements and attributes, PI targets) the string is tested. Anything
else (i.e. documents?)
would be an error, though it is a matter of negotiation.

>
>The other point is to know if we're happy to rely on XPath 1.0 (which
>among other things isn't streamable) to qualify the binding between
>nodes and character repertoires.
>
>
We rely on it formally, for the spec, but you see in part 4 I also
reserve the query language
binding name "stx-charrep". The idea that implementations could support
a streamable
Xpath with this; if there has been a streamable XPath defined by someone
that we
can refer to in an ISO spec, we could also formally define it too. But
since it seems there
is no suitable streaming version of XPath (I don't think the XML
Schema's subset is
really useful) I think it is best to allow nature to take its course:
provide a mechanism
(@langauge) and reserve and promote the appropriate term, but don't
formally define it.

In other words, I just want to provide the bare minimum that
implementers need, without
restricting them from innovating. For example, there is no requirement
that a conforming
implementation even support "xslt-charrep": that is the starting point.
Lets not prematurely
standardize: lets allow implementers and users work out what is
convenient.

Cheers
Rick Jelliffe

--
DSDL members discussion list
To unsubscribe, please send a message with the
command  "unsubscribe" to dsdl-discuss-request@dsdl.org
(mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)
Received on Thu Apr 8 11:26:31 2004

This archive was generated by hypermail 2.1.8 : Fri Dec 03 2004 - 14:00:28 UTC