Version: 2007-04-11 Rick Jelliffe
This document summarises the corrections and additions collated by the editor for ISO Schematron for discussion by ISO/IEC JTC1 SC34 WG1 in 2007. If this were prepared and adopted fast, I do not believe we need to adopt any version control features, such as a new namespace to distinguish version 1 schemas from version 2.
In the lists below, “implementation impact” refers primarily to the editor’s implementation which is the most widely used and has had a lot of open source contributions.
1. Query
Language Binding for XSLT2 (normative)
This would mirror the current default binding annex, with production and
reference changes as necessary.
Justification: XSLT2 is proving
popular as an implementation language and is in use, running on the SAXON XSLT2
processor; users from the Schematron-love-in mail list have requested this. XSLT2
provides a more powerful expression language compared to XSLT1, including
support of a for-each iterator that may overcome some of Schematron’s
limitations for parallel constraints (see Eric van der Vlist’s booklet for a
description of the limitation.) Most importantly, XSLT2 has a more complete
range of functions. It seems that there will be some more industry support for
XSLT2 other than SAXON.
Issue for resolution: A user has requested that they wish to use the
type-aware XPath 2 features of XSLT. This goes against the general architecture
and motivation of ISO DSDL which was XML-in/XML-out rather than PSVI. However,
it is clearly useful. Suggested solution: the query language binding for XSLT 2
should allow two schema/@queryBinding values: “xslt2” and “xslt2-typed” where
the former does not require a PSVI whereas the second does. Section 6.4 would
have “xslt2-typed” added to the reserved list.
Implementation impact: None: A minimal implementation need not support
this. A conforming implementation may have some extra work, but the issues seem
largely resolved.
2. Using
DTTL & CRDL with Schematron (informative)
This would explain how to use DTTL & CRDL in conjunction with the type
annotation/property features suggested below.
Issue for resolution: The form of this would depend on which kind of
type/property mechanism was adopted.
1. “Reject”
Introduction of new element “reject” as alias for “report”.
Justification: “Reject” is terminology used in the W3C Rules Interchange
Framework. More importantly, users have reported a desire to use schema rules
to both validate for pass/fail and also to generate warnings as part of the
same session. These users are uncomfortable using the general-purpose @role
attribute for this purpose (such as role=”fatal” or role=”warning”.) The
usefulness of such an element has been increased because the UBL implementation
feature of terminating on the first hard error. The definitions would be adjusted
so that validity includes rejects that succeed, and a stricter form of validity
defined that does excludes reports.
Implementation impact: Minimal. A minimal implementation could just
re-use its existing <report> code.
2. Richer
Text The rich text sections where there currently are p* should be changed
to allow slightly richer text, following from HTML: div, ul, ol, ul, img and
a. This does not affect contents of assertions, phases or diagnostics.
Justification: A user is basing a requirements management system on
Schematron schemas, and needs richer text.
Issue for resolution: Would it be better to just adopt ISO HTML body
text? Should all ISO DSDL languages do the same?
Implementation impact: Medium. A minimal implementation could ignore
these. A complete implementation would add extra calls.
3. Includes.
Improve existing mechanism
This may need more study: is the need modularity or to include libraries?
Justification: complaints that the current mechanism is inadequate, in
particular because the document can just contain a single fragment and therefore
is difficult to validate and maintain, and does not import diagnostics or
phases if they are needed. UBL wants better inclusion.
Issue for resolution: I see two possibilities
a. Improve the kind of IRI allowed. This could range from just allowing a fragment reference (#xxxx) to an ID attribute in the target, to full XPointer syntax.
b. Add
a RELAX NG style import statement, which would allow referencing to a schema
Two proposal existing. One is to just allow a fragment indicator on the URLs.
Implementation impact: For XSLT implementation, simple fragment identifiers are trivial and I have prototyped a solution. XPointers depend on the availability of an open source library. An import solution is a little more complex, especially if there is an idea of injecting constraints into an existing pattern.
4. Visit
text nodes: Allow patterns in the default query language binding to visit
text nodes.
Justification: Several users and an implementer (Uche Ogbuji) have
requested this, to reduce what they see as gratuitous differences with XSLT,
and to express some constraints more naturally.
Issue for resolution: I am not convinced about this and have not seen
any real use cases that justify it, yet. It does not change the expressiveness
of the language and potentially blows out the nodes that must be visited. XSLT2
provides a for-each in XPaths to loop in any case. However, these are not
compelling arguments. The underlying issue is whether schemas describe
structures with explicit markup; in Schematron’s case, this is whether we are
finding patterns in XML documents or patterns primarily in the markup of XML
documents.
Implementation impact: Small, but in an error-prone statement.
5. Rule
titles. Rules should allow an optional title element.
Justification: A user has requested this.
Implementation impact: None. A minimal conforming implementation is not
required to do anything with this element.
6. Type
Annotation and extra metadata: Allow extra arbitrary information to be
declared on the rule subject, in particular to support 1) XSD datatypes, 2)
DTTL datatypes, 3) CRDL repertoires, 4) extra metadata from content management
systems. An application is not required to validate annotations, it required
framework help.
Justification: Type binding and annotation is an important and useful
capability. DTTL and CRDL need to be invoked by something. XForms has pioneered
the use of XPaths to bind to types, and there is an XForms-to-Schematron
implementation.
Issue for resolution: I see four choices:
a. Minimal. Allow @xsi:type on sch:rule. Pro: simple. Con: inexpressive
b. Datatype Library: Adopt the same mechanism as RELAX NG: a rule can have a subelements sch:data which can in turn contain a list of sch:param. The sch:data element also identifies a dataType library. Pro: consistent. Could be added to sch:assert too. Con: type-centred
c. SAL: W$C spec in development which gives a type a name then links to up and down converters in XSLT. Pro: simple, RDF worked out, not tied to other DSDL. Con: not declarative, not tied to other DSDL, and duplicates some of DTTL.
d. General:
Allow sch:property on sch:rule (and schema, pattern, phase). Content model
something like
property = element property { @name, @scheme?, @value | property+ }
to allow recursive properties at any nesting level. The scheme attribute
follows html:meta and is an IRI with a datatypeLibrary or a units indicator,
etc. Pro: powerful, useful for non-type information, fits into XSD facet
mechanism, not arbitrarily restricted, allows more information to be passed on
than just allowed by @flag and @role and so useful for reporting and grading
systems. Con: different to RELAX NG, moves beyond typing.
There is also a running issue
that some people want asserts to allow more structure. This was the fundamental
design decision for Schematron, that the assertion would be a simple sentence
not multiple paragraphs with arbitrary metadata. Almost every other Schematron
re-invention has more complex assertions (but no phase/pattern/rule framework),
so there is a trade-off. However, as in the case of diagnostics, we could over
come the issue by having a properties section, properties = element properties
( property* } after patterns, and for asserts to have an attribute @properties
with IDREFs. This keeps sch:assert flat. Con: content management systems and
type systems can use foreign namespaces: why do we need to provide
infrastructure for this? In any case, successful asserts don’t typically
generate output, for validation, so for SVRL they would be wasted. Pro: this is
a classic “enabling” feature, and it takes annotation out of the product-implementation
arena and into the Schema SVRL arena.
Implementation impact: Minimal. A conforming application is not required
to use or understand these. SVRL would be upgraded to cope, so that properties
were provided as part of the report.
SVRL may need to be augmented based on other decisions above.
1. Richer
Text: SVRL revision for richer text
Justification: Users request more complete reflection of rich schema
information to output.
Implementation impact: Medium. If richer text were introduced, this may
also have some impact.
2. Validity:
SVRL revision to allow an attribute “valid | invalid” (based on
assert/reject)
Justification: SVRL currently does not have this, leaving it up to
marking systems. But it would be useful.
Implementation impact: Minimal. An XSLT 2 implementation would store all
validation results in a variable, then test them. I am not sure whether this is
the preferred solution for XSLT1 though.
3. Properties:
Type annotation or properties
Justification: Makes a “schema” into more than just a plan for what can
go where, but also an annotation framework. Provides more information for
reports to pass on. If we consider a two pass process where documents first get
marked by Schematron to generate an SVRL report, then that report is subsequently
assessed by Schematron to generate another SVRL report, we see that this is a
way of dividing up complex constraints into a pipeline: This overcomes to some
extent the problem that assertions are not structured into subassertions (this
problem is also reduced by sch:let.)
Implementation impact: Small. Copy all properties to output fired-rule.
These corrections should have no effect on implementations.
1. IRI Clarify the URI means IRI.
2. Section
6.4 Query Language Binding. Remove “xslt1.1” from the list.
Justification: Some users are disturbed that an item for a non-existent
technology should be reserved.
3. Revise
text to more forcefully state the disjunction of rule contexts.
Justification: New readers persistently do not expect that rules within
a pattern act as an if-then-else statement, and consequently are resistant to
interpret the current wording, which is strictly adequate.
4. Revise references section to add:
a. XSLT2
b. IRI
Justification: Required by new annex, Japanese comments