Corrections and Additions to ISO DSDL Part 3: Schematron

Version: 2007-04-11 Rick Jelliffe

This document summarises the corrections and additions collated by the editor for  ISO Schematron for discussion by ISO/IEC JTC1 SC34 WG1 in 2007. If this were prepared and adopted fast, I do not believe we need to adopt any version control features, such as a new namespace to distinguish version 1 schemas from version 2.

In the lists below, “implementation impact” refers primarily to the editor’s implementation which is the most widely used and has had a lot of open source contributions.

New Annexes

1.      Query Language Binding for XSLT2 (normative)
This would mirror the current default binding annex, with production and reference changes as necessary.

Justification: XSLT2 is proving popular as an implementation language and is in use, running on the SAXON XSLT2 processor; users from the Schematron-love-in mail list have requested this. XSLT2 provides a more powerful expression language compared to XSLT1, including support of a for-each iterator that may overcome some of Schematron’s limitations for parallel constraints (see Eric van der Vlist’s booklet for a description of the limitation.) Most importantly, XSLT2 has a more complete range of functions. It seems that there will be some more industry support for XSLT2 other than SAXON.
Issue for resolution: A user has requested that they wish to use the type-aware XPath 2 features of XSLT. This goes against the general architecture and motivation of ISO DSDL which was XML-in/XML-out rather than PSVI. However, it is clearly useful. Suggested solution: the query language binding for XSLT 2 should allow two schema/@queryBinding values: “xslt2” and “xslt2-typed” where the former does not require a PSVI whereas the second does. Section 6.4 would have “xslt2-typed” added to the reserved list.
Implementation impact: None: A minimal implementation need not support this. A conforming implementation may have some extra work, but the issues seem largely resolved.

2.      Using DTTL & CRDL with Schematron (informative)
This would explain how to use DTTL & CRDL in conjunction with the type annotation/property features suggested below.
Issue for resolution: The form of this would depend on which kind of type/property mechanism was adopted.

 

Additional Features

1.      “Reject” Introduction of new element “reject” as alias for “report”.
Justification: “Reject” is terminology used in the W3C Rules Interchange Framework. More importantly, users have reported a desire to use schema rules to both validate for pass/fail and also to generate warnings as part of the same session. These users are uncomfortable using the general-purpose @role attribute for this purpose (such as role=”fatal” or role=”warning”.) The usefulness of such an element has been increased because the UBL implementation feature of terminating on the first hard error. The definitions would be adjusted so that validity includes rejects that succeed, and a stricter form of validity defined that does excludes reports.  
Implementation impact: Minimal. A minimal implementation could just re-use its existing <report> code.  

2.      Richer Text The rich text sections where there currently are p* should be changed to allow slightly richer text, following from HTML:  div, ul, ol, ul, img and a. This does not affect contents of assertions, phases or diagnostics.
Justification: A user is basing a requirements management system on Schematron schemas, and needs richer text.
Issue for resolution: Would it be better to just adopt ISO HTML body text? Should all ISO DSDL languages do the same?
Implementation impact: Medium. A minimal implementation could ignore these. A complete implementation would add extra calls.

3.      Includes. Improve existing mechanism
This may need more study: is the need modularity or to include libraries?
Justification: complaints that the current mechanism is inadequate, in particular because the document can just contain a single fragment and therefore is difficult to validate and maintain, and does not import diagnostics or phases if they are needed. UBL wants better inclusion.
Issue for resolution: I see two possibilities

a.      Improve the kind of IRI allowed. This could range from just allowing a fragment reference (#xxxx) to an ID attribute in the target, to full XPointer syntax.

b.      Add a RELAX NG style import statement, which would allow referencing to a schema
Two proposal existing. One is to just allow a fragment indicator on the URLs.

Implementation impact: For XSLT implementation, simple fragment identifiers are trivial and I have prototyped a solution. XPointers depend on the availability of an open source library. An import solution is a little more complex, especially if there is an idea of injecting constraints into an existing pattern.  

4.      Visit text nodes: Allow patterns in the default query language binding to visit text nodes.
Justification: Several users and an implementer (Uche Ogbuji) have requested this, to reduce what they see as gratuitous differences with XSLT, and to express some constraints more naturally.
Issue for resolution: I am not convinced about this and have not seen any real use cases that justify it, yet. It does not change the expressiveness of the language and potentially blows out the nodes that must be visited. XSLT2 provides a for-each in XPaths to loop in any case. However, these are not compelling arguments. The underlying issue is whether schemas describe structures with explicit markup; in Schematron’s case, this is whether we are finding patterns in XML documents or patterns primarily in the markup of XML documents.
Implementation impact: Small, but in an error-prone statement.

5.      Rule titles. Rules should allow an optional title element.
Justification: A user has requested this.
Implementation impact: None. A minimal conforming implementation is not required to do anything with this element.

6.      Type Annotation and extra metadata: Allow extra arbitrary information to be declared on the rule subject, in particular to support 1) XSD datatypes, 2) DTTL datatypes, 3) CRDL repertoires, 4) extra metadata from content management systems. An application is not required to validate annotations, it required framework help.
Justification: Type binding and annotation is an important and useful capability. DTTL and CRDL need to be invoked by something. XForms has pioneered the use of XPaths to bind to types, and there is an XForms-to-Schematron implementation.
Issue for resolution: I see four choices:

a.      Minimal. Allow @xsi:type on sch:rule. Pro: simple. Con: inexpressive

b.      Datatype Library: Adopt the same mechanism as RELAX NG: a rule can have a subelements sch:data which can in turn contain a list of sch:param. The sch:data element also identifies a dataType library. Pro: consistent. Could be added to sch:assert too. Con: type-centred

c.       SAL: W$C spec in development which gives a type a name then links to up and down converters in XSLT. Pro: simple, RDF worked out, not tied to other DSDL. Con: not declarative, not tied to other DSDL, and duplicates some of DTTL.

d.      General: Allow sch:property on sch:rule (and schema, pattern, phase). Content model something like
  property = element property { @name, @scheme?, @value | property+ }
to allow recursive properties at any nesting level. The scheme attribute follows html:meta and is an IRI with a datatypeLibrary or a units indicator, etc. Pro: powerful, useful for non-type information, fits into  XSD facet mechanism, not arbitrarily restricted, allows more information to be passed on than just allowed by @flag and @role and so useful for reporting and grading systems. Con: different to RELAX NG, moves beyond typing.

There is also a running issue that some people want asserts to allow more structure. This was the fundamental design decision for Schematron, that the assertion would be a simple sentence not multiple paragraphs with arbitrary metadata. Almost every other Schematron re-invention has more complex assertions (but no phase/pattern/rule framework), so there is a trade-off. However, as in the case of diagnostics, we could over come the issue by having a properties section,  properties = element properties ( property* } after patterns, and for asserts to have an attribute @properties with IDREFs. This keeps sch:assert flat. Con: content management systems and type systems can use foreign namespaces: why do we need to provide infrastructure for this? In any case, successful asserts don’t typically generate output, for validation, so for SVRL they would be wasted. Pro: this is a classic “enabling” feature, and it takes annotation out of the product-implementation arena and into the Schema  SVRL arena.
Implementation impact: Minimal. A conforming application is not required to use or understand these. SVRL would be upgraded to cope, so that properties were provided as part of the report.

SVRL Enhancements

SVRL may need to be augmented based on other decisions above.

1.      Richer Text: SVRL revision for richer text
Justification: Users request more complete reflection of rich schema information to output.
Implementation impact: Medium. If richer text were introduced, this may also have some impact.

2.      Validity: SVRL revision to allow an attribute “valid | invalid” (based on assert/reject)
Justification: SVRL currently does not have this, leaving it up to marking systems. But it would be useful.
Implementation impact: Minimal. An XSLT 2 implementation would store all validation results in a variable, then test them. I am not sure whether this is the preferred solution for XSLT1 though.

3.      Properties: Type annotation or properties
Justification: Makes a “schema” into more than just a plan for what can go where, but also an annotation framework. Provides more information for reports to pass on. If we consider a two pass process where documents first get marked by Schematron to generate an SVRL report, then that report is subsequently assessed by Schematron to generate another SVRL report, we see that this is a way of dividing up complex constraints into a pipeline: This overcomes to some extent the problem that assertions are not structured into subassertions (this problem is also reduced by sch:let.)
Implementation impact: Small. Copy all properties to output fired-rule.  

General Editorial

These corrections should have no effect on implementations.

1.      IRI Clarify the URI means IRI.

2.      Section 6.4 Query Language Binding. Remove “xslt1.1” from the list.
Justification: Some users are disturbed that an item for a non-existent technology should be reserved.

3.      Revise text to more forcefully state the disjunction of rule contexts.
Justification: New readers persistently do not expect that rules within a pattern act as an if-then-else statement, and consequently are resistant to interpret the current wording, which is strictly adequate.

4.      Revise references section to add:

a.      XSLT2

b.      IRI

Justification: Required by new annex, Japanese comments