Rick
Interesting. I'm not sure how attributes would apply rather than elements
embedded within the schema's appinfo element. It would help if you had
examples that showed the use of all three of your extension framework
"places". I'm also unclear what would happen if elements and attributes were
associated with the same validatable element.
Another point I am unclear about is how and when you would add element names
to the Extended Validity Attempted attribute. Is this done as you stream in
to indicate that the element needs to be validated against these types of
rules, or after validation, to indicate what rules have been applied? What
happens if there is a name in the Extended Validity Diagnostics that is not
included in the Extended Validity Attempted list?
Which brings me neatly onto my last point. The use of namespace as the first
member of the Extended Validity Attempted list is confusing. It seems to
imply that only one element of a specific type can be used in each rule set.
Surely this should not be the case? What you probably want is either a
unique identifier of the relevant rule, or a path that identifies which rule
in a set it was.
Martin
----- Original Message -----
From: "Rick Jelliffe" <rjelliffe@allette.com.au>
To: <ElektonikaMail@gwparis.dyomedea.com>
Cc: "'XML Developers List'" <xml-dev@lists.xml.org>;
<dsdl-discuss@dsdl.org>; <schematron-love-in@eccnet.com>;
<xmlschema-dev@w3.org>
Sent: Tuesday, March 14, 2006 11:31 AM
Subject: [dsdl-discuss] [xml-dev] Re: Fw: Co-Occurance constraint proposal
> Hi Paul,
>
> On the topic of co-occurrence constraints and enhancements to XSD 1.1, I
> have gathered a few thoughts as you asked and attached them in the HTML
> document. If you could pass this on the XSD WG as an external
> submission on the topic, I would be very grateful.
>
> The submission is titled "A Call for Rapprochement between W3C XSD and
> ISO DSDL:
> A Non-Intrusive Extension Framework for XSD 1.1 to Support Schematron
> and Beyond" and has the following main points:
>
> 1) Simple co constraints best handled by allowing attributes in content
> models, a la RELAX NG. The limited paths suggested are baroque,
> overkill and so proably harmful in this context. If the WG is
> considering generalizing <all> (which I support, especially if it moves
> closer to RELAX NG's <interleave>) it is a convenient opportunity.
>
> 2) XSD needs an extension mechanism. Appropriate PSVI properties and
> elements suggested.
>
> 3) Non-streaming, type-aware Schematron should be a required extension.
> An appropriate subset is given, similar but fuller to your suggestion.
>
> 4) Extension may also provide a saner home and future for key and
> uniqueness constraints too.
>
> I am glad to see continued enhancement of XSD and wish the WG with
> success in it. I believe that the approaches I suggest might find good
> vendor buy-in, compared with a less layered approach which I fear may
> have considerable pushback, and with justification. I hope Microsoft,
> Apache, Sun and other XSD engine developers will encourage the WG to
> only support enhancements that lead to a less complicated or more
> manageable world, and that they will commend my suggestions as being
> workable, harmless and even useful. :-)
>
> Cheers
> Rick Jelliffe
>
> Editor, ISO 19757 DSDL - Part 3: Schematron
> C.T.O. Topologi Pty. Ltd.
> Member, W3C XSD WG 1999-2001
> Member, ISO SC34 DSDL WG 2003-2006
>
----------------------------------------------------------------------------
----
> A Call for Rapprochement between W3C XSD and ISO DSDL:
> A Non-Intrusive Extension Framework for XSD 1.1 to Support Schematron and
Beyond
> Rick Jelliffe
>
> 2006-03-14
>
> This note is a contribution to discussions on adding various kinds of
constraint checking to XSD 1.1. The bottom line is a call for a
rapprochement between W3C WSD and ISO DSDL: ISO DSDL is not a stalking horse
for RELAX NG but should be considered a valuable and primary resource for li
ttle languages and approaches for evolving XSD in positive direction.
>
> ISO 19757 Document Schema Description Languages (DSDL) is a multi-part
standard for standardizing small, narrow-focus schema languages. It is often
portrayed as some kind of attempted competitor to XSD, due to OASIS RELAX NG
being one part, but it ain't necessarily so. If you consider ISO DSDL parts
3 and on as a series of small schema languages designed to complement any
grammar-based schema language without adding to monolithic complexity, then
XSD is clearly the primary potential adopter of ISO DSDL.
>
> The current state of affairs with the W3C XSD WG reminds me of the SGML
working group at ISO when we enhancing IS 8879 SGML to encompass XML which
we had recently developed through W3C. One group felt that we just needed to
parameterize SGML more, to complicate it, in order to cope with the
variations required by XML; another group, which I belonged to, instead felt
that layering was the answer: at a certain point it becomes positively
harmful to add complications to a base specification. So we added an
"additional constraints" link ("SEEALSO"), which allowed the SGML
constraints to be extended by an external document: SGML validity remained
determinate and unitary, and the additional constraints could be validated
as a further, different type of validity (i.e. XML well-formedness.)
>
> What is the relevance to XSD? XSD is in the same position as SGML was a
decade ago: large, stiflingly difficult to implement, and with a strong
requirement not weaken determinate validity. Similarly, this requirement not
weaken validity is mistakenly opposed to ideas of layering. In fact the
reverse is true: layered systems are easier to test, implement and reason
about.
>
> I write not only as the developer of Schematron, and former member of both
the W3C XSD WG and the ISO DSDL WG, but also as a commercial developer of
schema-related products including XSD. It has long been obvious to me that
the XSD-inspired dazedness would eventually clear and that calls for schema
capabilities beyond or above that of simple grammars would then have their
season: the intent of ISO DSDL has been to collect such little languages for
early adopters and for general commmunity benefit as the
fog/panic/exploration of XSD clears.
>
> Let me frank, but certainly with no disrespect intended. When XSD was
developed, perhaps a majority of the then XSD WG had never actually written
a serious DTD or other schema for XML (or SGML.) This perhaps ultimately
showed itself in a certain obsessional intricacy in some minor areas
(nillibility, the lack of integration of keys and uniqueness into the type
system, extension by suffixation only, elementFormDefault, etc.) at the
expense of other major areas (patterns on mixed content for example.) I
suspect that probably the majority of the current XSD WG may have never
written a constraint-based schema for XML, e.g. with Xlinkit, XCSL,
Schematron or even XSLT. One cannot after all be a specialist in everything!
The tendency of the XSD WG may therefore to be less aware than grassroots
users of the advantages, characteristics and opportunities afforded by
Schematron, and therefore relegate it to a few convenient categories such as
"co-occurrence constraint." Schematron was developed in 1999, and has
continued in its current popularity solely because of its general-purpose
utility, not because of any hype or party spirit: XSD users favour it
equally with RELAX NG and DTD users. It is used for detecting conflicting
flight plans over Belgium, checking software architecture rules in USA,
checking local government forms in Japan, and the conformance of documents
to house rules by big three publishers including here in Australia.
>
> Executive Summary
> 1.. Support for simple co-occurrence constraints is better done by
allowing attributes as particles in content models rather than by using path
expressions. Recommend to adopt the mechanism used successfully by ISO DSDL
Part 2 (RELAX NG).
>
> 2.. XSD needs an extension mechanism which will allow embedded little
languages with constraints required for extended validity. Recommend new
PSVI properties such as [extended validity] to support extensibility. Raid
and support the ISO DSDL effort for appropriate extension languages.
>
> 3.. XSD needs a constraint language, regardless of the support for 1)
above. It should use the extension mechanism in 2) above. Recommend ISO DSDL
Part 3 (Schematron) as a required (or strongly recommended) extension.
>
> 4.. XSLT Keys and Uniqueness probably should be moved out of Part 1 and
re-cast as an extension. I don't suppose this is feasible, for reason of
scaring the horses. But it is an example of exactly the kind of schema
language that should be an extension. Because key and uniqueness constraints
are embedded in the Structures specification currently, there is no
XSD-compliant way in which developers can experiment and evolve new schema
languages. The danger of this is that the XSD WG is forced to do armchair
speculative development of enhancements ("yeah that sounds good"): a recipe
for perpetual premature standardization, inadequate testing, and a sure way
to gather dead wood.
>
> Attributes in Content Models
> Support for simple co-occurrence constraints is better done by allowing
attributes as particles in content models rather than by using path
expressions. The approach used by ISO RELAX NG should be adopted: it has
proved to be straightforward to implement, easy for users to understand, is
declarative, streamable. This would require, I believe, no changes to the
PSVI.
>
> Adopting this does not entail any notion of somehow saying "RELAX NG was
right and we were wrong"; on the contrary, interested users would rejoice
that XSD was adopting proven technology. RELAX NG adopted this feature late,
it was not obvious to James Clark and Murato Makoto etc. that it was the
correct feature to adopt. (I believe I was one of the first to suggest it.)
But it has proven itself. I hope the XSD WG relentless refuses to take part
in any childish NIH-ism in this: XSD's earliest development was guided by
the proven experience with various deployed schema languages. I strongly
recommend that the XSD WG discipline itself to adopt proven features from
existing deployed schema languages when they are available, as in this case.
>
> Even though I am obviously a fan of Xpaths, introducing some reduced Xpath
based syntax is, I believe, the wrong approach here: while technically
feasible it plays to all XSD's weaknesses. It complicates understanding
while only addressing one small area: exactly the kind of "not enough bangs
per buck" that XSD is notorious for.
>
> In particular, one of Paul Biron's useful suggestions, to use the
streaming subset of Xpaths to identify a node whose presence provides the
condition for the occurrence of an attribute or element, is, I think on
reflection, the wrong way to go. First because the content model enhancement
above is simpler, cleaner and roughly equivalent. Second because it uses
paths for what they are OK at, but does not use them for where they shine:
with value-based predicates and with random access. Third because the idea
that just providing the most basic occurrence constraints will actually
satisfy user requirements is wrong-headed: a tokenistic path language will
merely temporarily shift the boundary at which user frustration with XSD's
power sets in.
>
> Extension Framework
> XSD needs an extension mechanism which will allow embedded little
languages with constraints required for extended validity.
>
> XSD is notoriously under-layered and complicated to reason about. That
even vendors so freely admit the difficulties they have faced in
implementing XSD properly should be uttermost in the mind of the XSD WG, I
believe; the failure of implementability in XSD must not be quibbled out of,
especially given that XSD had a long gestation period that, one would
expect, would have made early implementations higher quality than expected.
>
> So the WG needs to adopt a very different mindset: I am not talking about
changing the PSVI approach or type derivation, I am talking of the futility
of "valid means valid everywhere in all implementations" when combined with
a monolithic architecture. The failure of XSD implementations to provide
consistent validity results is to some extent attributable to this
monolithic architecture. I suggest that the problem is not with "valid means
valid everywhere in all implementations" but in the lack of extensibility in
XSD: Appinfo is not enough.
>
> My suggestion for an extension mechanism is below. The focus of ISO DSDL
(Document Schema Description Languages) is no to provide a standard library
of little languages, suitable for XSD to include or allow by reference.
These include ISO DSDL Part 3: DTLL (Datatype Language Library) and ISO DSDL
Part 7: CRDL (Character Repertoire Description Language). In case there is
any feeling that these are somehow "anti W3C" technologies, I perhaps should
note that DTLL was developed by Jeni Tennison, invited expert to the W3C
XSLT WG, while CRDL was developed by Martin Duerst, long time head of W3C
Internationalization. Indeed, CRDL is based on a technical note at the W3C
>
>
>
>
> Extension Framework Details
> All places where <xsd:appinfo> are allowed and the top-level should allow
a new element <xsd:extension>, allowing any element in a non-XSD namespace
>
> Attributes:
> a.. [Extended validation attempted] (Assessment Outcome (Attribute)
>
> b.. [Extended validity] (Assessment Output (Attribute))
>
> c.. [Extended validity diagnostics] (Assessment Output (Attribute))
>
> Elements
> a.. [Extended validation attempted] (Assessment Outcome (Element)
>
> b.. [Extended validity] (Assessment Output (Element))
>
> c.. [Extended validity diagnostics] (Assessment Output (Element))
>
> Document Root
> a.. [Extended validation attempted] (Assessment Outcome (Document Root)
>
> b.. [Extended validity] (Assessment Output (Document Root))
>
> c.. [Extended validity diagnostics] (Assessment Output (Document Root))
>
> [Extended validity] is defined as [validity] plus successful use of all
elements in relevant extension elements. Importantly, this allows an on-ramp
for implementations to keep their current notations of validity: they can
allow but ignore all extensions. However, for extended-validity (which
should become the new default for implementations to support) validation
fails if either there is an element in an unknown namespace (i.e. One for
which the schema implementation does not support) or if validation with
those constraints fail, then extended validation fails. This satisfies the
important objection to "optional" validation: extended validity always means
extended validity.
>
> Note that it is a design requirement in XSD that [Validity] can be
assessed in a single-pass streaming fashion. It is not a design requirement
that [Extended Validity] can be assessed in this manner. This split commends
a layered approach.
>
> Note that the extended validity of the Document Root refers to outcomes of
validating extensions defined on the document root, and should not be
confused with "the validity of the document".
>
> [Extended validation attempted] gives a list of the namespaces of the
children of the relevant extension elements, which provide keys for
different kinds of extended validation.
>
> [Extended validity diagnostics] are lists of [namespace, text] pairs,
which provide the namespace of the extension coupled with a human-readable
text message, for example as generated dynamically by Schematron. (Note: the
PSVI extension do not limit the ability of an API to report other
information from schemas for various uses, or to perform different kinds of
non-standard validations.)
>
> The presence of these extra PSVI items is the key to extensibility. I
don't believe any "required-extension" mechanism is needed or warranted.
>
> Schematron as a Required Extension
> XSD 1.1 should define ISO Schematron as a required or strongly recommended
extension.
>
> To some extent, attempting to cover all important bases with exhaustive
declarative enhanements to XSD becomes an exercise in tail-chasing: even if
XSD is extended with a dozen new co-occurrence constraint elements, there
will still be the need for a general purpose constraint language. And,
indeed, the best way to determine which constraints should be generalized
into some first-class property in XSD is to first provide a general purpose
constraint language like Schematron to gather information and increase user
and WG expertise.
>
> The subset of Schematron used conforms to ISO 19757-3 Information
technology - Document Schema Definition Languages (DSDL) - Part 3:
Rule-based validation - Schematron (2006) Annex F: Use of Schematron as a
Vocabulary. The namespace used is http://purl.oclc.org/dsdl/schematron
>
> The following effective DTD are the required elements and attributes of
the subset. ISO Schematron defines other elements: it is an error for them
to be present. ISO Schematron defines other attributes: it is not an error
for these to be present; they may be ignored.
>
> <!ELEMENT sch:rule (sch:let*, (sch:assert | sch:report)+)>
> <!ATTRIBUTE sch:rule
> context (.) #FIXED '.'
> id CDATA #IMPLIED>
> <!ELEMENT sch:let EMPTY>
> <!ELEMENT sch:let
> name CDATA #REQUIRED
> value CDATA #REQUIRED >
> <!ELEMENT sch:assert (#PCDATA | sch:span | sch:emph | sch:dir | sch:name
| sch:value-of)*>
> <!ATTRIBUTE sch:assert
> test CDATA #REQUIRED>
> <!ELEMENT sch:report (#PCDATA | sch:span | sch:emph | sch:dir | sch:name
| sch:value-of)*>
> <!ATTRIBUTE sch:pattern
> test CDATA #REQUIRED>
> <!ELEMENT sch:span (#PCDATA)><!ELEMENT sch:emph (#PCDATA)><!ELEMENT
sch:dir (#PCDATA)>
> <!ELEMENT sch:name EMPTY><!ATTLIST sch:name select CDATA #IMPLIED
><!ELEMENT sch:value-of EMPTY><!ATTLIST sch:name select CDATA #IMPLIED
>Note that in this subset:
>
> a.. The context attribute is restricted to be ".". In the case of an
<extension> element that appears at the top-level of a schema rather than in
a content model, this is "/" or the document root node (not the root
element). For example, this allows a constraining the top-level element of
any document to a certain range.
>
> b.. No special presentation processing is required for the text of
elements span, emph and dir in the PSVI.
>
> c.. The element sch:name should be resolved to the qname of the local
element attribute (or type if that is ever possible.)
>
> d.. Phases, diagnostics, patterns, abstract rules and abstract patterns
are not part of the subset defined.
>
> e.. The path expression in the test attribute is interpreted as a
boolean expression; it may not resolve to a particular type or node. For
simple co-occurrence constraints, use the extended path expressions above.
>
> f.. The path expressions are interpreted as if they are type-aware Xpath
2 path expressions. If an implementation can only handle some simpler
subset, such as Xpath 1, the implementation fails with an error at run time.
>
> g.. The path expressions may require more than streaming access. This is
one issue which sets apart simple [validity] from [extended validity]
>
> h.. For other semantics, see the ISO Schematron spec, e.g. At
http://www.schematron.com/
>
> i.. I would like to stress that the provision of Schematron in extension
elements reduces lock-in. At some future stage, some bright people unknown
could come up with some better system as yet undreamed of. At that time, the
XSD WG can then adopt the new constraint system as the required extension,
and obsolete Schematron. Compare this with the difficulty in, say, adding a
new facet or changing the key and uniqueness constraints in monolithic XSD
1.0.
>
> j.. The provision of Schematron simplifies the task of XSD enhancement,
because it gives a plausible workaround for rejected requirements to users.
For example, a user who wants to specify that the top-level element must be
"book"
>
>
>
>
>
>
>
>
>
--
DSDL members discussion list
To unsubscribe, please send a message with the
command "unsubscribe" to dsdl-discuss-request@dsdl.org
(mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)
Received on Tue Mar 14 20:48:08 2006
This archive was generated by hypermail 2.1.8 : Wed Apr 12 2006 - 14:48:02 UTC