[dsdl-discuss] Strawman: bringing the framework inside schemas

From: Eric van der Vlist <vdv@dyomedea.com>
Date: Sat May 25 2002 - 15:25:05 UTC

Hi,

Thinking about how the framework could help to define datatypes, I came
to the idea of bringing the framework inside schemas using framework
elements as extension.

The attached strawman is really only rough unpolished ideas serialized
as XML to illustrate what I mean, but I think that this might be an
interesting direction to dig into and I am eager to hear your comments!

I would also like to know how we could bring external people in this
discussion (I am thinking of people like KAWAGUCHI Kohsuke for his
experience implementing W3C XML Schema and Simon St.Laurent whose
regular expressions I have borrowed). Is it possible to invite them on
this list or should we wait untill things are more polished and bring
the discussion to dsdl-comment?

Thanks

Eric

-- 
See you in San Diego.
                               http://conferences.oreillynet.com/os2002/
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
http://xsltunit.org      http://4xt.org           http://examplotron.org
------------------------------------------------------------------------

Strawman: bringing the framework within the schemas

Strawman: bringing the framework within schemas

Eric van der Vlist

May 25, 2002

Introduction

The "natural" way to define our interoperability framework seems to be "outside" schemas and pre-validation transformations, using a push mechanism such as defined by XPipe. Assuming that we use the namespace prefix "if", this could lead to constructs such as:

<ie:define name="canonicalValidation">
 <ie:process type="http://www.w3.org/1999/XSL/Transform" href="myC14n.xsl">
  <ie:process type="http://relaxng.org/ns/structure/1.0" href="mySchema.rng">
 </ie:process>
</ie:define>

to define a Relax NG validation performed after an XSLT canonicalization.

The big benefit of such an external framework is to be compatible with any existing tool. However, being non intrusive it may become heavy when the transformations and the validation get mixed together as it is the case for the transformation between the parsed and the lexical space.

If we wanted to define the "pre-lexical" transformation (which may depend on the text node or attribute under validation) using an external framework, we would need to split the validation in two phases: a first phase stoped before the pre-lexical information producing an anotated document, the pre-lexical transformation using these annotations to do its job and the final validation performed on the result of the pre-lexical transformation and this whole process seems messy and intrusive.

The other solution which is the subject of this strawman would be to include framework elements within the schemas to define transformations to be performed on the nodes during the validation.

Use cases

The use cases presented in this strawman are:

Note: all the examples are presented assuming the default namespace is Relax NG. The XSLT and XPath snipets have not been tested but I hope that there are few enough errors to give a good understanding of what I mean!

Whitespace processing

A specific transformation could be used for whitespace processing unless XPath was being used. The following construct pattern could then define that spaces should be normalized before doing further tests:

<element name="foo">
<data>
<if:process type="http://www.w3.org/TR/1999/REC-xpath-19991116" value="normalize-space()">
<data type="bar"/>
</if:process>
</data>
</element>

Alternative syntax using a value element:

<element name="foo">
<data>
<if:process type="http://www.w3.org/TR/1999/REC-xpath-19991116">
 <value>normalize-space()</value>
<data type="bar"/>
</if:process>
</data>
</element>

The meaning of such a construct would be "apply the processing defined here before doing further tests" to the current node.

Note that in this first case, there is a "built in" fallback mechanism for Relax NG processors which do not support the framework: per the Relax NG specification, such processors should just ignore the "if:process" element and validate any text node. 


This other pattern:

<element name="foo">
<if:process type="http://www.w3.org/TR/1999/REC-xpath-19991116" value="normalize-space()">
<data type="bar"/>
</if:process>
</element>

is also valid but would apply "normalize-space()" to the current element (instead of its first text node as in the previous example). Applying the datatype "bar" to the normalized value of current element would mean converting the elements which may be embedded into a single normalized text value. A Relax NG processor not supporting the framework would expect empty "foo" elements in this second pattern and there might be a need in this case to provide a fallback mecanism, for instance:

<element name="foo">
<choice>
<if:process type="http://www.w3.org/TR/1999/REC-xpath-19991116" value="normalize-space()">
<data type="bar"/>
</if:process>
<zeroOrMore if:process="ignore">
<ref name="anyElement"/>
<text/>
</zeroOrMore>
</choice>
</element>

The first alternative of the choice would be ignored by non framework compliant Relax NG processors, while the if:process="ignore" attribute of the second alternative could be used to instruct framework compliant Relax NG processor to ignore it. 

List type separator adjustment

Could be done as:

<element name="foo">
<data>
<if:process type="http://www.w3.org/TR/1999/REC-xpath-19991116" value="normalize-space(translate(., ',', ' '))">
<data type="bar"/>
</if:process>
</data>
</element>

Number localization

<element name="foo">
<data>
<if:process type="http://www.w3.org/TR/1999/REC-xpath-19991116" value="normalize-space(translate(., ',', '.'))">
<data type="bar"/>
</if:process>
</data>
</element>

Date localization

This one would probably deserve its own library. However, it would be verbose but could be done using XSLT:

<define name="foo">
<if:process type="http://www.w3.org/1999/XSL/Transform">
<if:value>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="">
<xsl:template match="/foo">
<xsl:choose>
<xsl:when test="contains(., 'janvier')">
<xsl:value-of select="concat(normalize-space(substring-after(., 'janvier'), ' '), '-01-', normalize-space(substring-before(., 'janvier'))"/>
</xsl:when>
.../...
<xsl:when test="contains(., 'décembre')">
<xsl:value-of select="concat(normalize-space(substring-after(., 'décembre'), ' '), '-12-', normalize-space(substring-before(., 'décembre'))"/>
</xsl:when>
</xsl:choose>
</xsl:template>
</xsl:transform>
</if:value>
<data type="xs:date"/>
</if:process>
</define>

Or using a regexp library:

<define name="foo">
<element name="foo">
<if:process type="http://www.perldoc.com/perl5.6.1/pod/perlre.html" value="s/([0-9]*) janvier ([0-9]*)/$2-01-$1/">
.../...
<if:process type="http://www.perldoc.com/perl5.6.1/pod/perlre.html" value="s/([0-9]*) décembre ([0-9]*)/$2-12-$1/">
<data type="xs:date"/>
</if:process>
.../...
</if:process>
</element>
</define>

Date decomposition

Using Regular Fragmentations:

<define name="foo">
<if:process type="http://simonstl.com/ns/fragments/">
<if:value>
<fragmentRules xmlns="http://simonstl.com/ns/fragments/">
<fragmentRule pattern="([0-9]*)-([0-9]*)-([0-9]*)">
<applyTo>
<element nsURI="" localName="foo"/>
</applyTo>
<produce>
<element nsURI="" localName="year" prefix="" />
<element nsURI="" localName="month" prefix="" />
<element nsURI="" localName="day" prefix="" />
</produce>
</fragmentRule>
</fragmentRules>
</if:value>
<element name="foo">
<element name="year">
<data type="xs:integer"/>
</element>
<element name="month">
<data type="xs:integer"/>
</element>
<element name="day">
<data type="xs:integer"/>
</element>
</element>
</if:process>
</define>

Using XSLT (with sub-minimal error checking and no fallback):

<define name="foo">
<if:process type="http://www.w3.org/1999/XSL/Transform">
<if:value>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="">
<xsl:template match="/foo">
<xsl:element name="year">
<xsl:value-of select="substring-before(.,'-')"/>
</xsl:element>
<xsl:element name="month">
<xsl:value-of select="substring-before(substring-after(., '-'),'-')"/>
</xsl:element>
<xsl:element name="day">
<xsl:value-of select="substring-after(substring-after(., '-'),'-')"/>
</xsl:element>
</xsl:template>
</xsl:transform>
</if:value>
<element name="year">
<data type="xs:integer"/>
</element>
<element name="month">
<data type="xs:integer"/>
</element>
<element name="day">
 <data type="xs:integer"/>
</element>
</if:process>
</define>

Alternatively, the transformation may be kept external:

<define name="foo">
<if:process type="http://www.w3.org/1999/XSL/Transform">
<if:value href="splitDate.xsl"/>
<element name="foo">
<element name="year">
<data type="xs:integer"/>
</element>
<element name="month">
<data type="xs:integer"/>
</element>
<element name="day">
<data type="xs:integer"/>
</element>
</element>
</if:process>
</define>

Date recomposition

<element name="foo">
<if:process type="http://www.w3.org/TR/1999/REC-xpath-19991116" value="concat(year, '-', month, '-', day)">
<data type="xs:date"/>
 </if:process>
</element>

Validating only the XLink attributes 

<element name="foo">
<if:process type="http://www.w3.org/1999/XSL/Transform">
<if:value>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="">
<xsl:template match="*[xlink:@type]">
<xsl:element name="xlink:@type">
<xsl:copy-of select="xlink:@*"/>
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:template match="*[xlink:@*]">
<xsl:element name="undefined">
<xsl:copy-of select="xlink:@*"/>
<xsl:apply-templates select="*"/>
</xsl:template>
 <xsl:template match="*">
<xsl:apply-templates select="*"/>
</xsl:template>
</xsl:transform>
</if:value>
<ref name="xlinkElements"/>
</if:process>
</element>

The transformation would transform the content of the foo element into elements having the name of their xlink:type attribute and their xlink:attributes. A generic Relax NG pattern would then be quite easy to write which could capture the constraints of the XLink vocabulary (such as mandatory xlink:type attribute, xlink:href, xlink:role and xlink:arcrole being URI references, the structure of complex links, ...).

Issues



--
DSDL members discussion list
To unsubscribe, please send a message with the
command  "unsubscribe" to dsdl-discuss-request@dsdl.org
(mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)
Received on Sat May 25 11:25:09 2002

This archive was generated by hypermail 2.1.8 : Fri Dec 03 2004 - 14:00:27 UTC