[dsdl-discuss] Re: Draft of Part 7 Character Repertoire Valdiation

From: Rick Jelliffe <ricko@allette.com.au>
Date: Thu Apr 08 2004 - 10:35:44 UTC

MURATA Makoto wrote:

>On Wed, 07 Apr 2004 18:56:14 +1000
>Rick Jelliffe <ricko@allette.com.au> wrote:
>
>
>
>>It is a kind of Schematron, except that assertions tests expect
>>"Character Class" as used in XML Schemas, Perl, Java, etc.
>>
>>
>
>Many thanks for your contribution. I have three questions.
>
>First, have you compared your proposal and "Character Repertoire
>Validation for XML Documents"? Certainly, these proposals are similar.
>
>http://dret.net/netdret/docs/wilde-iuc24.pdf
>
>
Yes: I also use Eric's approach of using XMl Schema's character class.
(And I stole the name
"charrep"). But I have tried to simplify it:

1) Schematron is defined to provide frameworks, and the DSDL group asked
me in Phillie
 to provide a solution to Part 7 using Schematron as the framework.
2) Character classes already provide the ^ operator, so there is no need
to extent Schematron
  to provide special element for <restrict>
3) I think that facets like string length should be out-of-scope for
this part: in particular
  because it means that then you probably need a mechanism to figure out
whether
  length applies recursively, for mixed content.

So I don't believe that the Schematron-based approach is any less
powerful. His global
declarations are nice, but it is only sugar, and I suspect that
Schematron's flat case-statement
approach is better for simplifying complex rules. Actually, I suspect
that people would
be confused by CVRX's rules: they are sufficiently different from RELAX
or Schematron.

>One difference is that CRVX allows contraints on PI targets, CDATA
>sections, PI contents, comments, element local names, attribute local
>names, namespace names, and namespace prefixes. I'm not saying CRVX
>is better here, but I just want to make sure if my understanding is
>correct. It appears that your proposal allows constraints on
>element names or attribute names, but I do not understand your second
>example in Annex B.
>
Oh, my wording needs to be better. My prososal also allows those.
Indeed, the first example
tries to show that this is the intent:

 <sch:rule context="*/name()">
    <sch:assert test="\p{IsBasicLatin}">
       Generic Identifiers of elements should be ASCII repertoire
    </sch:assert>
</sch:rule>

>Second, is it possible to create a list of kanjis for elementary school
>students and reference to it?
>
Yes, you could make an abstract rule for this. And the rule could be
included from
an external URI using entities. If it is a requirement (sounds good) I
will check that
Schematron's library element (our version of <include> is good enough
for this:
I suspect it may not be, in which case we would have to adopt something like
RELAX NG's <include>...hmm.

>Third, in the future, I would like to extend RELAX NG so that its <text>
>and <mixed> can reference to descriptions of character repetoire constraints,
>which are described in Part 7. Are such applications in the scope of your draft?
>
>
You tell me!

I think the simplest thing would be to do it independently: just adopt
the same
character classes, but figure out whether it applies recursively or not
as fits in
with RELAX NG best. The trouble with recursion is that then you have
to unset it; the trouble with no recursion is that you have to put it on
multiple
elements declarations.

In Schematron, it is not difficult, becase we can make the test
non-recursive
and let the user decide by putting it in their XPath context expressions.

Thinking about it for RELAX NG, I guess that the use cases fall into two
categories:
 1) It is really a kind of data-typing, where there is some character
restriction
  intrinsic to the semantics of the element.
 2) It is really production or editorial validation, where it is the
particular
  set of documents that we are testing. (This area, "document corpus
typing",
  is one that my company has been working on: we sample document
  sets and derive validators, editor templates, stylesheet stubs and
  complexity metrics from multiple documents of the same schema, rathe
  than relying on the often-too-broad schema.)

One way for RELAX NG would be that the first type is handled by
a non-recursice datatyping mechanism, applying to all immediate content
but not grandchild content. Then the second type would be handled
by a recursive test, declared on the grammar itself: a global constraint
on all (element?) content.

(Actually, I suspect that the second requirement is better handled by
validation performed by the parser's entity manager, as a kind of
normalization check; yikes!)

Cheers
Rick Jelliffe

--
DSDL members discussion list
To unsubscribe, please send a message with the
command  "unsubscribe" to dsdl-discuss-request@dsdl.org
(mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)
Received on Thu Apr 8 12:35:53 2004

This archive was generated by hypermail 2.1.8 : Fri Dec 03 2004 - 14:00:28 UTC