[dsdl-comment] Re: Schematron include issues

From: <rjelliffe@allette.com.au>
Date: Wed Dec 31 2008 - 11:08:55 UTC

>>
>>> 1. In the RELAX NG schema for Schematron in Annex A, in the schema
>>> pattern where it now says pattern+, I believe it ought to say
>>> (pattern|inclusion)+: a schema element containing just
>>>
>>> The full pattern is
>>
>> schema = element schema {
>> attribute id { xsd:ID }?, rich, attribute schemaVersion {
>> non-empty-string }?,
>> attribute defaultPhase { xsd:IDREF }?, attribute queryBinding {
>> non-empty-string }?,
>> (foreign
>> & inclusion*
>> & (title?,
>> ns*,
>> p*,
>> let*,
>> phase*,
>> pattern+,
>> p*,
>> diagnostics?
>>
>> Have I interpreted RELAX NG wrong: I thought this allowed multiple
>> inclusions anywhere?
>
>
> The problem is that:
>
> <schema><include/></schema>
>
> doesn't match
>
> element schema { inclusion* & pattern+ }
>
> In other words, <schema> cannot require a <pattern> child, because the
> <pattern> might be provided by the <include>. Changing it to
>
> element schema { inclusion* & (pattern|inclusion)+ }
>
> solves it, as would, perhaps more understandably
>
> element schema { inclusion* & (pattern+|inclusion) }

What about
  element schema { inclusion* & pattern* }
i.e.

schema = element schema {
   attribute id { xsd:ID }?, rich, attribute schemaVersion {
 non-empty-string }?,
   attribute defaultPhase { xsd:IDREF }?, attribute queryBinding {
 non-empty-string }?,
   (foreign
    & inclusion*
    & (title?,
      ns*,
      p*,
      let*,
      phase*,
      pattern*,
      p*,
      diagnostics?
}

or in the proposed new one

schema = element schema {
   attribute id { xsd:ID }?, rich, attribute schemaVersion {
 non-empty-string }?,
   attribute defaultPhase { xsd:IDREF }?, attribute queryBinding {
 non-empty-string }?,
   (foreign
    & inclusion*
    & (title?,
      ns*,
      p*,
      let*,
      phase*,
      pattern*,
      p*,
      diagnostics?,
      properties?
}

where inclusion allows includes and extends.

>> 2) Using the <sch:extends> element for a smarter inclusion, which
>> inserts
>> all the children of the named element.
>
>
> Do you mean <sch:include>? If so, this sounds like a good direction.

No, sch:extends. This is because it already has exactly the semantic (in
abstract patterns) that is needed.

> There's also the need sometimes for the including document to
> override/redefine things in the included document (like parameter entities
> in the internal DTD subset can override things in the external DTD).
> RELAX
> NG handles this by allowing the <include> element to contain definitions
> that will override definitions in the included document.

I have never had a use case for the overriding of actual definitions, but
I think Schematron already provides four features that reduce the need for
redefinition:

 * Many constraints use data values from the instance, rather than
hard-coding them. Similarly, with external codelists. So there can be
less reliance on static data values hardcoded into the schema, the way it
has to be with other systems (such as enumerations in XSD).

 * The top-level <sch:param> allows passing of command-line arguments,
which allows invocation-specific validation. These can be used in boolean
expression, for example, in tests.

 * The phase mechanism provides a way to subdivide and manage constraints,
so that rather than a phase can select the patterns that are unchanged
and add new ones. So rather than redefining a pattern "tables" to cope
with some new structure, we can just make a new pattern "new-tables" and
make that active in the current phase. I think this is much superior to
redefine, because the phase can be given a name and documentation,
without requiring that the existing phase and superceded patterns be
removed: so the one schema supports all the variants explicitly. This
removes the need for schema management, where you have multiple schemas
for different version, and they can get lost.

 * Finally, because Schematron patterns typically each are open, there is
frequently less need to alter the schema because of some unforeseen
occurrence: for example, by specifying child element order as a partial
order, new elements can be slipped in relatively easily.

The thing that the XSLT-based Schematron cannot do is dynamic evaluation:
the params cannot contain XPaths. A QLB using a dynamic language (i.e.
that supported eval()) could do that.

> Getting an inclusion mechanism right is very tricky.

Yes. I think we have a much clearer idea of what people need now, compared
to 5 years ago. Support for inclusion of compiled code lists (e.g. UBL)
and importing multiple patterns, in particular. I think extending extends
is pretty good for that.

> I would prefer to keep things simple. One of Schematron's biggest
> strengths
> is its simplicity: it's easy for simplicity to gradually slip away as new
> features are added.

Yep. I don't want to hinder the development of "proper" declarative
specialist schema languages by getting too grandiose. It is better to play
to your strengths rather than pander to your weaknesses, I think: meaning
that providing a really bountiful range of output information (such as the
properties) which does not effect the operational semantics at all, is
pretty good.

> At the moment the information that Jing reports is limited by the SAX
> ErrorHandler interface (to a string plus systemId/lineNumber etc).

The ElementHandler should be queryable to get the XPath too. Some users
wanted human readable XPaths, others wanted machine friendly.

> Eventually I would like to allow Jing to produce XML validation reports,
> not
> only for Schematron but also for NVDL and RELAX NG.

Yes, it has been surprisingly popular.

> Ideally I would like to see SVRL generalized and separated out into its
> own
> DSDL part. There would be a schema-language independent namespace for
> capturing information that is schema-language independent, and then
> additional namespaces for capturing information that is specific to
> particular schema lanaguges. For example, with NVDL, it can sometimes be
> hard to understand error messages from the schema languages that NVDL
> dispatches to, so you might sometimes want reports for NVDL to include the
> sections into which NVDL has split the document up.

SVRL was made to allow an implementation developer to test that the same
result was happening when run with different engines. Hence the strange
flat design, where events are not as well nested or ordered as might be
optimal for a general purpose language.

I am very open to having a pan-DSDL validation record as part of DSDL:
people need metadata such as which instances, which schemas, which dates,
plus summaries of the validation results considered as binary
valid/invalid.

E.g.

result = element dsdl-validatation-report {
  element validation-record ...,
  element svrl:schematron-validation-report ... |
  element dsdl:line-oriented-validation-report ...|
  element dsdl:path-oriented-validation-report ...
}

> If the document I'm including starts with <!DOCTYPE and I insert those
> bytes
> into the including document in place of the <include> element, I will get
> something that's not well-formed.

By WF I mean the result must be WF, that it is an error if it is not, and
that some magic must be done.

> Including bytes and including infosets don't always give the same results
> even when including bytes results in something well-formed: consider
> in-scope namespaces and base URIs.

How wise that Schematron doesn't use namespaces declarations for the
attribute values, then :-) Is that what you are saying :-)

> I'm talking about the processing of the schema (as described in 6.2). I
> believe that the RELAX NG data model does the right thing here. In any
> case,
> something needs to be said: at the moment, there is not enough information
> in the standard for an implementer to know what they are supposed to
> implement.

The customer is always right!

> For the document being validated, it should be up to the QLB.

It is an issue that is worthwhile to periodically reconsider. It is a
chicken-and-egg problem: if we don't support xml:base etc, then people
cannot use them.

> For the schema, xml:id doesn't seem relevant. Schematron has chosen to
> roll
> its own include mechanism (like RELAX NG) rather than use xinclude (like
> NVDL), so I think xml:include shouldn't be included. In my view xml:base
> should be: it's in most XML specs (including the infoset and this RELAX
> NG).

I put in an extension to my implementation so that if an include has a a #
fragment identifier, the include will match any //*/@id and //*/@xml:id
It seems a reasonable hack, as far as helping users, e.g. with Schematron
schemas embedded in RELAX NG or XSDs, instead of using Eddie Robertsson's
extractor.

Cheers
Rick

--
DSDL comments
To unsubscribe, please send a message with the
command  "unsubscribe" to dsdl-comment-request@dsdl.org
(mailto:dsdl-comment-request@dsdl.org?Subject=unsubscribe)
Received on Wed Dec 31 12:09:53 2008

This archive was generated by hypermail 2.1.8 : Wed Dec 31 2008 - 11:13:15 UTC