[dsdl-discuss] Re: Part 10 Scenario

From: Erik Bruchez <ebruchez@orbeon.com>
Date: Tue Feb 08 2005 - 23:43:01 UTC

Eric van der Vlist wrote:

>>Lets subscribe Erik to dsdl-discuss (and hope he remains an active
>>participant!)
>
> Done.
>
> Erik, you are very welcome to post your proposal to the list!

Thanks for welcoming me on this list!

I will start with saying that I am not quite up to date regarding DSDL
and the related languages, except Relax NG, but Eric forwarded to me
the use case discussed on this list a few weeks ago, and I set to
propose a simple example implementing the use case with XPL.

XPL stands for "XML Pipeline Language". It was developed by my
company, Orbeon, since 2002. An implementation of the language is
available in the open source Orbeon PresentationServer project.

By the way, we announced today that we are joining ObjectWeb, and
PresentationServer is in the process of being moved from
SourceForge.net to the ObjectWeb Forge:

   http://www.orbeon.com/company/pr-objectweb

Back to XPL, we recently wrote a fairly formal draft specification of
XPL which we hope will lead to an XPL 1.0 specification. We are
looking to build interest in the specification and in XML pipelines in
general, because as far as we can tell there is nothing quite like it
at this point out there.

So I guess I'll just go ahead and attach an implementation of the use
case using XPL pre-1.0, that is XPL as it runs today. It uses a
"Validation" processor which is half hypothetical, half real
(PresentationServer has a Validation processor that supports Relax NG
and W3C XML Schema). It also uses some hypothetical processors for
some of the DSDL languages, and then uses an XSLT processor to build
the final document. What's really important is that all those
processors are connected together thanks to XPL.

Some comments by myself and Eric are at the top of the document. The
syntax of XPL is I hope more or less self-explanatory. The draft spec
is not online yet, but there is some information here:

   http://www.orbeon.com/ois/doc/reference-xpl-pipelines

Let's take it from there, and please let me know if you have any
questions!

-Erik

<p:config xmlns:p="http://www.orbeon.com/oxf/pipeline"
          xmlns:oxf="http://www.orbeon.com/oxf/processors">

    <p:param name="source-document" type="input"/>
    <p:param name="result-document" type="output"/>

    <!--
        First pass at writing a pipeline to implement the DSDL Test Scenario.

        Questions / issues:
    
        o = questions by Erik Bruchez
        v = answers by Eric van der Vlist

        o What is expected of the output of validators? Is the flow supposed to be interrupted when
          a validation error occurs?
    
            (v) Both questions are controversial :-) ...
                An overall principle for DSDL is that DSDL is only about validation and do not carry
                any kind of PSVI information. Following this principle, the result of a DSDL validation
                should be "valid" or "invalid".
    
                Now, this scenario seems to prove that this might not be the case for part 10
                (Validation Management) and this is also why I am now thinking that XPL may be interesting
                while if that was only about "valid"/"invalid" XPL wouldn't have been such a good fit
                IMO.
    
                My answer to your first question seems thus to be "a validation report containing at least a "yes/no"
                answer plus adhoc content.
    
                My personal answer to the second question would be "that depends". On the XMLfr publication
                process for instance, I have two kind of validations: a RNG schema that returns errors and a
                Schematron that validate good practises and returns warnings. If the first one could interrupt
                the flow, the second one shouldn't do it.
    
                If we wanted to bring that notion in Validation Management, that could mean that instead of
                "yes/no"; we could have an error level and that when invoking a validator we could define
                the level of the error to raise in case the validation fails.
    
                The default behaviour could be to stop at the first error (as you've implied in the pipeline),
                but an optional "config" input could be added that would allow to specify an error level. If the
                error level is positive, no exception would be raised.
    
                Now, a validator might have several outputs. What about defining several outputs for a validator:
    
                    - the data output (useful only for schema languages that, like DTDs or WXS augment the infoset
                       with stuff such as default values).
                    - the report output with a yes/no (or level) information, error messages or (for Schematron) the
                       validation report.
                    - to that, we could add a PSVI output in the case of W3C XML Schema (assuming we had a
                       XML format for the PSVI).
    
                When a validator would be configured with a positive error level, error detection could be done
                by checking the report output.
    
                All these answers are personal and should be checked with DSDL Working group.

        o For simplicity, I assumed that the NVDL processor here would produce outputs with those
          particular names. This would be possible only if the NVDL processor could be configured to
          map those output names to namespaces. Practically, this processor could either:

          o Have predefined output names, like document-1, document-2, etc.
          o Produce a single XML document with all the streams aggregated

          I do not know NVDL well enough to see what would be natural here.
    
                (v) None of them are natural :-) ...
    
                    Right now, NVDL is currently for validation only and takes care of invoking the different validators
                    to return a single "yes/no" answer.
    
                    Using it to split a document like mentioned in that scenario is thus an extrapolation
                    of what does the current NVDL implementation.
    
                    However, given the fact that NVDL splits documents according to their namespaces, I wonder
                    if the aggregating streams would be different enough from the original document ;-) ...
    
                    Thus, I am wondering if predefined names wouldn't be the best solution. Maybe, instead of using
                    document-i, we could map namespaces URIs on names (like they are maped to namespaces prefixes).

        o I have used a single validation processor that supports W3C XML Schema, Relax NG,
          Schematron, and DTDs (here DTDs would either have to be encaspulated into a root element,
          or referred externally). You could of course propose one processor per schema type. The
          PresentationServer validation processor currently supports transparently W3C Schema and
          Relax NG.

        o I proposed using XSLT to recombine the final document in the end.

        o Otherwise, the pipeline is very simple. Nothing is against parallel execution on XPL.
          Without exception support, the processing would just stop if there is a validation error.
          With exception support, it could resume, locally (per branch) if needed, or just propose a
          global fallback. Everything that is possible with exceptions.

    -->

    <!--
        1. Use NVDL to split out the parts of the document that are encoded using HTML, SVG and
        MathML from the bulk of the document, whose tags are defined using a user-defined set of
        markup tags.
     -->
    <p:processor name="oxf:nvdl">
        <p:input name="document" href="#source-document"/>
        <p:input name="rules">
            <rules>
                NVDL rules
            </rules>
        </p:input>
        <p:output name="html-stream" id="html-stream"/>
        <p:output name="svg-stream" id="html-stream"/>
        <p:output name="mathml-stream" id="html-stream"/>
     <!--
        
        (v) typo: the ids should be "svg-stream" &amp; "mathml-stream"...
        
        -->
        <p:output name="other-stream" id="other-stream"/>
    </p:processor>

    <!--
        2. Validate the HTML elements and attributes using the HTML 4.0 DTD (W3C XML DTD).
    -->
    <p:processor name="oxf:validation">
        <p:input name="data" href="#html-stream"/>
        <p:input name="schema">
            <!-- Reference to DTD for HTML -->
            <dtd href="..."/>
        </p:input>
        <p:output name="data" id="html-stream-validated"/>
    </p:processor>

    <!--
        3. Use a set of Schematron rules stored in check-metadata.xml to ensure that the metadata
        of the HTML elements defined using Dublin Core semantics conform to the information in the
        document about the document's title and subtitle, author, encoding type, etc.
    -->
    <p:processor name="oxf:validation">
        <p:input name="data" href="#html-stream-validated"/>
        <!-- Reference to Schematron schema for HTML metadata -->
        <p:input name="schema" href="check-metadata.xml"/>
        <p:output name="data" id="html-stream-schematronized"/>
       <!--
        
        (v) Note that in the case of Schematron, the data output is identical to the data input.
        
        -->
    </p:processor>

    <!--
        4. Validate the SVG components of the file using the standard W3C schema provided in the
        SVG 1.2 specification.
    -->
    <p:processor name="oxf:validation">
        <p:input name="data" href="#svg-stream"/>
        <!-- Reference to W3C Schema for SVG -->
        <p:input name="schema" href="svg-1.2.xsd"/>
        <p:output name="data" id="svg-stream-validated"/>
    </p:processor>

    <!--
        5. Use the Schematron rules defined in SVG-subset.xml to ensure that the SVG file only uses
        those features of SVG that are valid for the particular SVG viewer available to the system.
    -->
    <p:processor name="oxf:validation">
        <p:input name="data" href="#svg-stream-validated"/>
        <!-- Reference to Schematron schema for SVG subset -->
        <p:input name="schema" href="SVG-subset.xml"/>
        <p:output name="data" id="svg-stream-schmatronized"/>
    </p:processor>

    <!--
        6. Validate the MathML components using the latest version of the MathML. schema (defined
        in RELAX-NG) to ensure that all maths fragments are valid. The schema will make use the
        datatype definitions in check-maths.xml to validate the contents of specific elements.
    -->
    <p:processor name="oxf:validation">
        <p:input name="data" href="#mathml-stream"/>
        <!-- Reference to Relax NG shema for MathML -->
        <p:input name="schema" href="mathml-1.0.rng"/>
        <p:output name="data" id="mathml-stream-validated"/>
    </p:processor>

    <!--
        7. Use MathML-SVG.xslt to transform the MathML segments to displayable SVG and replace each
        MathML fragment with its SVG equivalent.
    -->
    <p:processor name="oxf:xslt">
        <p:input name="data" href="#mathml-stream-validated"/>
        <p:input name="config" href="MathML-SVG.xslt"/>
        <p:output name="data" id="mathml-as-svg"/>
    </p:processor>

    <!--
        8. Use the DSRL definitions in convert-mynames.xml to convert the tags in the local nameset
        to the form that can be used to validate the remaining part of the document using
        docbook.dtd.
    -->
    <p:processor name="oxf:dsrl">
        <p:input name="data" href="#other-stream"/>
        <p:input name="config" href="convert-mynames.xml "/>
        <p:output name="data" id="docbook-stream"/>
    </p:processor>

    <p:processor name="oxf:validation">
        <p:input name="data" href="#docbook-stream"/>
        <!-- Reference to DTD Docbook -->
        <p:input name="schema">
            <dtd href="..."/><!-- Reference to W3C DTD -->
        </p:input>
        <p:output name="data" id="docbook-stream-validated"/>
    </p:processor>

    <!--
        9. Use the CRDL rules defined in mycharacter-checks.xml to validate that the correct
        character sets have been used for text identified as being Greek and Cyrillic.
    -->
    <p:processor name="oxf:crdl">
        <p:input name="data" href="#docbook-stream-validated"/>
        <p:input name="config" href="mycharacter-checks.xml "/>
        <p:output name="data" id="docbook-stream-validated-2"/>
    </p:processor>

    <!--
        10. Convert the Docbook tags to HTML so that they can be displayed in a web browser using
        the docbook-html.xslt transformation rules.
    -->
    <p:processor name="oxf:xslt">
        <p:input name="data" href="#docbook-stream-validated-2"/>
        <p:input name="config" href="docbook-html.xslt"/>
        <p:output name="data" id="docbook-as-html"/>
    </p:processor>

    <!--
        After completion of step 10 the HTML (both streams), and SVG (both streams) should be
        recombined to produce a single stream that can fed to a web browser.
    -->
    <p:processor name="oxf:xslt">
        <p:input name="data" href="#html-stream-schematronized"/>
        <p:input name="html-2" href="#docbook-as-html"/>
        <p:input name="svg-1" href="#svg-stream-schmatronized"/>
        <p:input name="svg-2" href="#mathml-as-svg"/>
        <p:input name="config" href="stylesheet-to-aggregate-everything.xsl"/>
        <p:output name="data" ref="result-document"/>
    </p:processor>

</p:config>

--
DSDL members discussion list
To unsubscribe, please send a message with the
command  "unsubscribe" to dsdl-discuss-request@dsdl.org
(mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)
Received on Wed Feb 9 00:43:13 2005

This archive was generated by hypermail 2.1.8 : Tue Feb 22 2005 - 22:13:02 UTC