[dsdl-discuss] Re: Strawman: bringing the framework inside schemas

From: Eric van der Vlist <vdv@dyomedea.com>
Date: Wed May 29 2002 - 13:44:03 UTC

Attached is a new version of my strawman including:

- an example of implementation of Rick's unit/number proposal
- an example using STX
- some first considerations on implementation (I think that it
   fits nicely with James' derivative algorithm.

Thanks for your feedback.

Eric

-- 
See you in San Diego.
                                http://conferences.oreillynet.com/os2002/
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
http://xsltunit.org      http://4xt.org           http://examplotron.org
------------------------------------------------------------------------

Strawman: bringing the framework within the schemas

Strawman: bringing the framework within schemas

Eric van der Vlist

May 29, 2002

History

Introduction

The "natural" way to define our interoperability framework seems to be "outside" schemas and pre-validation transformations, using a push mechanism such as defined by XPipe. Assuming that we use the namespace prefix "if", this could lead to constructs such as:

<ie:define name="canonicalValidation">
 <ie:process type="http://www.w3.org/1999/XSL/Transform" href="myC14n.xsl">
  <ie:process type="http://relaxng.org/ns/structure/1.0" href="mySchema.rng">
 </ie:process>
</ie:define>

to define a Relax NG validation performed after an XSLT canonicalization.

The big benefit of such an external framework is to be compatible with any existing tool. However, being non intrusive it may become heavy when the transformations and the validation get mixed together as it is the case for the transformation between the parsed and the lexical space.

If we wanted to define the "pre-lexical" transformation (which may depend on the text node or attribute under validation) using an external framework, we would need to split the validation in two phases: a first phase stoped before the pre-lexical information producing an anotated document, the pre-lexical transformation using these annotations to do its job and the final validation performed on the result of the pre-lexical transformation and this whole process seems messy and intrusive.

The other solution which is the subject of this strawman would be to include framework elements within the schemas to define transformations to be performed on the nodes during the validation.

Use cases

The use cases presented in this strawman are:

These additional use case have been borrowed from an email from Rick Jelliffe:

Note: all the examples are presented assuming the default namespace is Relax NG. The XSLT and XPath snipets have not been tested but I hope that there are few enough errors to give a good understanding of what I mean!

Whitespace processing

A specific transformation could be used for whitespace processing unless XPath was being used. The following construct pattern could then define that spaces should be normalized before doing further tests:

<element name="foo">
<data>
<if:process type="http://www.w3.org/TR/1999/REC-xpath-19991116" value="normalize-space()">
<data type="bar"/>
</if:process>
</data>
</element>

Alternative syntax using a value element:

<element name="foo">
<data>
<if:process type="http://www.w3.org/TR/1999/REC-xpath-19991116">
 <value>normalize-space()</value>
<data type="bar"/>
</if:process>
</data>
</element>

The meaning of such a construct would be "apply the processing defined here before doing further tests" to the current node.

Note that in this first case, there is a "built in" fallback mechanism for Relax NG processors which do not support the framework: per the Relax NG specification, such processors should just ignore the "if:process" element and validate any text node. 

This other pattern:

<element name="foo">
<if:process type="http://www.w3.org/TR/1999/REC-xpath-19991116" value="normalize-space()">
<data type="bar"/>
</if:process>
</element>

is also valid but would apply "normalize-space()" to the current element (instead of its first text node as in the previous example). Applying the datatype "bar" to the normalized value of current element would mean converting the elements which may be embedded into a single normalized text value. A Relax NG processor not supporting the framework would expect empty "foo" elements in this second pattern and there might be a need in this case to provide a fallback mecanism, for instance:

<element name="foo">
<choice>
<if:process type="http://www.w3.org/TR/1999/REC-xpath-19991116" value="normalize-space()">
<data type="bar"/>
</if:process>
<zeroOrMore if:process="ignore">
<ref name="anyElement"/>
<text/>
</zeroOrMore>
</choice>
</element>

The first alternative of the choice would be ignored by non framework compliant Relax NG processors, while the if:process="ignore" attribute of the second alternative could be used to instruct framework compliant Relax NG processor to ignore it. 

List type separator adjustment

Could be done as:

<element name="foo">
<data>
<if:process type="http://www.w3.org/TR/1999/REC-xpath-19991116" value="normalize-space(translate(., ',', ' '))">
<data type="bar"/>
</if:process>
</data>
</element>

Number localization

<element name="foo">
<data>
<if:process type="http://www.w3.org/TR/1999/REC-xpath-19991116" value="normalize-space(translate(., ',', '.'))">
<data type="bar"/>
</if:process>
</data>
</element>

Date localization

This one would probably deserve its own library. However, it would be verbose but could be done using XSLT:

<define name="foo">
<if:process type="http://www.w3.org/1999/XSL/Transform">
<if:value>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="">
<xsl:template match="/foo">
<xsl:choose>
<xsl:when test="contains(., 'janvier')">
<xsl:value-of select="concat(normalize-space(substring-after(., 'janvier'), ' '), '-01-', normalize-space(substring-before(., 'janvier'))"/>
</xsl:when>
.../...
<xsl:when test="contains(., 'décembre')">
<xsl:value-of select="concat(normalize-space(substring-after(., 'décembre'), ' '), '-12-', normalize-space(substring-before(., 'décembre'))"/>
</xsl:when>
</xsl:choose>
</xsl:template>
</xsl:transform>
</if:value>
<data type="xs:date"/>
</if:process>
</define>

Or using STX (a subset derived from XSLT designed to be streamable):

<define name="foo">
<if:process type="http://temporary.uri/2002/stx">
<if:value>
<stx:transform xmlns:stx="http://temporary.uri/2002/stx" version="1.0">
<stx:template match="/foo">
<stx:element name="year">
<stx:value-of select="substring-before(.,'-')"/>
</stx:element>
<stx:element name="month">
<stx:value-of select="substring-before(substring-after(., '-'),'-')"/>
</stx:element>
<stx:element name="day">
<stx:value-of select="substring-after(substring-after(., '-'),'-')"/>
</stx:element>
</stx:template>
</stx:transform>
</if:value>
<element name="year">
<data type="xs:integer"/>
</element>
<element name="month">
<data type="xs:integer"/>
</element>
<element name="day">
 <data type="xs:integer"/>
</element>
</if:process>
</define>


Or using a regexp library (to be defined):

<define name="foo">
<element name="foo">
<if:process type="http://www.perldoc.com/perl5.6.1/pod/perlre.html" value="s/([0-9]*) janvier ([0-9]*)/$2-01-$1/">
.../...
<if:process type="http://www.perldoc.com/perl5.6.1/pod/perlre.html" value="s/([0-9]*) décembre ([0-9]*)/$2-12-$1/">
<data type="xs:date"/>
</if:process>
.../...
</if:process>
</element>
</define>

Date decomposition

Using Regular Fragmentations:

<define name="foo">
<if:process type="http://simonstl.com/ns/fragments/">
<if:value>
<fragmentRules xmlns="http://simonstl.com/ns/fragments/">
<fragmentRule pattern="([0-9]*)-([0-9]*)-([0-9]*)">
<applyTo>
<element nsURI="" localName="foo"/>
</applyTo>
<produce>
<element nsURI="" localName="year" prefix="" />
<element nsURI="" localName="month" prefix="" />
<element nsURI="" localName="day" prefix="" />
</produce>
</fragmentRule>
</fragmentRules>
</if:value>
<element name="foo">
<element name="year">
<data type="xs:integer"/>
</element>
<element name="month">
<data type="xs:integer"/>
</element>
<element name="day">
<data type="xs:integer"/>
</element>
</element>
</if:process>
</define>

Using XSLT (with sub-minimal error checking and no fallback):

<define name="foo">
<if:process type="http://www.w3.org/1999/XSL/Transform">
<if:value>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="">
<xsl:template match="/foo">
<xsl:element name="year">
<xsl:value-of select="substring-before(.,'-')"/>
</xsl:element>
<xsl:element name="month">
<xsl:value-of select="substring-before(substring-after(., '-'),'-')"/>
</xsl:element>
<xsl:element name="day">
<xsl:value-of select="substring-after(substring-after(., '-'),'-')"/>
</xsl:element>
</xsl:template>
</xsl:transform>
</if:value>
<element name="year">
<data type="xs:integer"/>
</element>
<element name="month">
<data type="xs:integer"/>
</element>
<element name="day">
 <data type="xs:integer"/>
</element>
</if:process>
</define>

Alternatively, the transformation may be kept external:

<define name="foo">
<if:process type="http://www.w3.org/1999/XSL/Transform">
<if:value href="splitDate.xsl"/>
<element name="foo">
<element name="year">
<data type="xs:integer"/>
</element>
<element name="month">
<data type="xs:integer"/>
</element>
<element name="day">
<data type="xs:integer"/>
</element>
</element>
</if:process>
</define>

Date recomposition

<element name="foo">
<if:process type="http://www.w3.org/TR/1999/REC-xpath-19991116" value="concat(year, '-', month, '-', day)">
<data type="xs:date"/>
 </if:process>
</element>

Validating only the XLink attributes 

<element name="foo">
<if:process type="http://www.w3.org/1999/XSL/Transform">
<if:value>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="">
<xsl:template match="*[xlink:@type]">
<xsl:element name="xlink:@type">
<xsl:copy-of select="xlink:@*"/>
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:template match="*[xlink:@*]">
<xsl:element name="undefined">
<xsl:copy-of select="xlink:@*"/>
<xsl:apply-templates select="*"/>
</xsl:template>
 <xsl:template match="*">
<xsl:apply-templates select="*"/>
</xsl:template>
</xsl:transform>
</if:value>
<ref name="xlinkElements"/>
</if:process>
</element>

The transformation would transform the content of the foo element into elements having the name of their xlink:type attribute and their xlink:attributes. A generic Relax NG pattern would then be quite easy to write which could capture the constraints of the XLink vocabulary (such as mandatory xlink:type attribute, xlink:href, xlink:role and xlink:arcrole being URI references, the structure of complex links, ...).

Conversion from "<foo>X bar</foo>" to "<foo unit="bar">X</foo>"

Using Regular Fragmentations:

<define name="foo">
<if:process type="http://simonstl.com/ns/fragments/">
<if:value>
<fragmentRules xmlns="http://simonstl.com/ns/fragments/">
<fragmentRule pattern="\s*([0-9]*)\s*(\S)*">
<applyTo>
<element nsURI="" localName="foo"/>
</applyTo>
<produce>
<element nsURI="" localName="foo" prefix="">
<attribute nsURI="" localName="unit" prefix="" />
</element>
 </produce>
</fragmentRule>
</fragmentRules>
</if:value>
<element name="foo">
<attribute name="unit">
<data type="xs:NMTOKEN"/>
</attribute>
<data type="xs:decimal"/>
</element>
</if:process>
</define>

Conversion from "<foo unit="bar">X</foo>" to "<foo>X bar</foo>" 

Using XSLT:

<define name="foo">
<if:process type="http://www.w3.org/1999/XSL/Transform">
<if:value>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="">
<xsl:template match="foo">
<xsl:copy>
<xsl:value-of select="concat(., ' ', @unit)"/>
</xsl:copy>
</xsl:template>
</xsl:transform>
</if:value>
<element name="foo">
<data type="xs:token"/>
</element>
</if:process>
</define>

Implementation, variables

The if:process element defines a transformation to be performed on the current node and this seems like an easy adaptation of the derivative algorithm described by James:

"The key concept used by this validation technique is the concept of a derivative. The derivative of a pattern p with respect to a node x is a pattern for what's left of p after matching x; in other words, it is a pattern that matches any sequence that when appended to x will match p."

Supporting an if:process element defining a transformation "t" on the current node could be expressed as:  "The derivative of a pattern p with respect to a node x submitted to a transformation T is a pattern for what's left of T(p) after matching x; in other words, it is a pattern that matches any sequence that when appended to x will match T(p)."

There are a couple of possibilities about what needs to be used as the source of this transformation, though: should we send the current node or a pointer on the current node in the document? The tradeoff being that sending the document doesn't play well with streaming validation but lets the transformation access to the entire document.

Also, my assumption is that the result of the transformation T(p), is not persistent in the sense that an implementation may cache this result to reuse it if it needs to reavaluate the derivation but that a user cannot explicitely reuse it in a schema. While this is the simpler solution, I am wondering if there wouldn't be use cases where explicit reuse of the results of the transformations would be needed. Could this be the case, maybe, for running integrity checks or additional Schematron tests? If this was the case, how should we deal with it? Should we let people define variables? But then what could be the scope of these variables? Or chould we just add flags to say if constraints are run on the parsed, lexical or value spaces?

Issues


--
DSDL members discussion list
To unsubscribe, please send a message with the
command  "unsubscribe" to dsdl-discuss-request@dsdl.org
(mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)
Received on Wed May 29 09:44:09 2002

This archive was generated by hypermail 2.1.8 : Fri Dec 03 2004 - 14:00:27 UTC