[dsdl-discuss] Re: Fw: Questions on Validation Management

From: Erik Bruchez <ebruchez@orbeon.com>
Date: Thu Nov 03 2005 - 13:26:23 UTC

We are going to be part of the XML Processing WG at W3C. They have a
pretty aggressive timeline, see:

   http://www.w3.org/XML/Processing/

At the moment, we have no idea how things will look like as work hasn't
started yet. But we (I) should stay in touch with the DSDL group.
Ideally though, somebody more involved with DSDL should participate in
the W3C effort.

-Erik

Martin Bryan wrote:
> Erik
>
> Many thanks for your comments, which we will discuss in detail in Atlanta on
> 13th November. One point I want to pick up on before then is:
>
>
>>3. I do personally think that it would be better to have a single,
>> standard XML processing language. Only if such a language does not
>> meet the requirements of DSDL should a specific language be
>> created, or extensions to the standard language be defined. Now
>> there is a problem since such a standard language does not yet
>> exist!
>
>
> I agree with this statement wholeheartedly. Providing a W3C-approved
> language meets all the DSDL user requirements for validation management we
> should simply be showing how to use that in the ISO technical report. But
> there is a proviso. We need people like yourself to champion the need for
> any W3C pipelining language to be able to meet the complex validation
> requirements of DSDL rather than just the normal 80/20 subset of validation
> requirements thatn W3C tend to compromise to.
>
> Martin
>
> ----- Original Message -----
> From: "Erik Bruchez" <ebruchez@orbeon.com>
> To: "Martin Bryan" <martin@is-thought.co.uk>
> Cc: "Alessandro Vernet" <avernet@orbeon.com>
> Sent: Thursday, October 27, 2005 1:21 PM
> Subject: Re: Fw: Questions on Validation Management
>
>
>
>>Hi Martin,
>>
>>I apologize for the delay.
>>
>>1. Was our W3C XPL submission already up when we talked at the time?
>> Just in case here it is:
>>
>> http://www.w3.org/Submission/2005/SUBM-xpl-20050411/
>>
>> Also, you may know (or may want to know) that W3C is currently
>> discussing the creation of an "XML processing working group", which
>> among a couple of things will be in charge of specifying an XML
>> processing language, i.e. an XML pipeline language. W3C members
>> have already been asked for feedback, and a decision to start the
>> WG should occur in November. People working on DSDL may have some
>> interest in joining that working group.
>>
>>2. I see the following question in your PDF file, in the XPL section:
>>
>> "If there is no such previous statement, the set of XML Infoset
>> identifiers in scope for a statement of an XPL program, unless
>> specified otherwise, consists of the set of identifiers specified
>> by the infoset attributes of the XPL statements. In other words,
>> this condition applies to the first statement of an XPL program."
>>
>> Is the above statement correct, or should "infoset attributes" read
>> "param elements"?"
>>
>> -> My answer: the XPL submission has made some updates to the
>> syntax of XPL as it is currently used. In the XPL submission
>> terminology, yes, we are referring to "infoset" attributes on
>> "p:input" elements. With the current XPL implementation in Orbeon
>> PresentationServer, we are using "p:param" elements. So the bottom
>> line is that the introduction to XPL in that document is not in
>> line with the actual XPL example that follows.
>>
>> I attach a version of the proposal in XPL with a syntax which is
>> compatible with the W3C submission.
>>
>>3. I do personally think that it would be better to have a single,
>> standard XML processing language. Only if such a language does not
>> meet the requirements of DSDL should a specific language be
>> created, or extensions to the standard language be defined. Now
>> there is a problem since such a standard language does not yet
>> exist!
>>
>> Now for one specific questions:
>>
>> "How can validation management identify the order in which
>> documents will be processed? Would we need to define extensions to
>> existing pipelining languages to provide this functionality?"
>>
>> I think a general-purpose XML processing language should allows for
>> specifying document processing order.
>>
>>Best,
>>
>>-Erik
>>
>>Martin Bryan wrote:
>> > Erik
>> >
>> > Since I last contacted you about the role of pipelining in validation
>> > management
>> > I've been unable to take the proposed technical report forward for
>>personal
>> > reasons. I have now reached the stage where I need to urgently review
>>what
>> > we have done so far and too determine what should be done at the
>>forthcoming
>> > meeting in Atlanta, on the weekend preceding XML 2005. I have not done
>> > anything to the XPL material other than change the tense of some of the
>> > phraseology in line with ISO policy. If you could you take a minute to
>> > review the changes I have made and see if you can suggest any other
>>useful
>> > changes I would be most grateful.
>> >
>> > On a side issue, you might have useful input to a question I have
>>posed to
>> > the DSDL team today, viz:
>> >
>>--------------------------------------------------------------------------
>
> -
>
>> >
>> > Dear WG1ers
>> >
>> > The attached update for Part 10 contains a whole raft of questions I
>> > want to discuss about the proposed extensions to the demonstration of
>> > pipelining techniques that were requested in Amsterdam. A summary of
>
> the
>
>> > main questions can be found on the last page.
>> >
>> > There is an underlying question behind all this:
>> >
>> > "Should validation management be based on the application of existing
>> > pipelining languages (presumably by finding a common subset between
>> > them) or does it require that we define functions over and above those
>> > defined in existing languages?"
>> >
>> > In another phaseology, "Do we make do with what's already there or do
>
> we
>
>> > need to get what's there improved, and if the latter do we have a
>> > mandate to do that?"
>> >
>> > Thoughts?
>> >
>>--------------------------------------------------------------------------
>
> -
>
>> > -
>> >
>> > How reasonable do you think it might be if ISO said to developers of
>> > existing pipelining techniques "You really need to add functionality
>
> to
>
>> > your
>> > langauge to allow users to ...." where ... includes the functionality
>> > suggested in the initial paragraphs drafted for the To Be Completed
>>clauses
>> > I've added to the latest version of the TR?
>> >
>> > Martin Bryan
>> > ISO SC34/WG1
>> >>
>>
>>
>
>
> ----------------------------------------------------------------------------
> ----
>
>
>
>><p:pipeline xmlns:p="http://www.orbeon.com/oxf/xpl"
>> xmlns:oxf="http://www.orbeon.com/oxf/processors">
>>
>> <p:input name="source-document" infoset="source-document"/>
>> <p:output name="result-document" infosetref="result-document"/>
>>
>> <!--
>> First pass at writing a pipeline to implement the DSDL Test
>
> Scenario.
>
>> Questions / issues:
>>
>> o = questions by Erik Bruchez
>> v = answers by Eric van der Vlist
>>
>> o What is expected of the output of validators? Is the flow
>
> supposed to be interrupted when
>
>> a validation error occurs?
>>
>> (v) Both questions are controversial :-) ...
>> An overall principle for DSDL is that DSDL is only about
>
> validation and do not carry
>
>> any kind of PSVI information. Following this principle,
>
> the result of a DSDL validation
>
>> should be "valid" or "invalid".
>>
>> Now, this scenario seems to prove that this might not be
>
> the case for part 10
>
>> (Validation Management) and this is also why I am now
>
> thinking that XPL may be interesting
>
>> while if that was only about "valid"/"invalid" XPL
>
> wouldn't have been such a good fit
>
>> IMO.
>>
>> My answer to your first question seems thus to be "a
>
> validation report containing at least a "yes/no"
>
>> answer plus adhoc content.
>>
>> My personal answer to the second question would be "that
>
> depends". On the XMLfr publication
>
>> process for instance, I have two kind of validations: a
>
> RNG schema that returns errors and a
>
>> Schematron that validate good practises and returns
>
> warnings. If the first one could interrupt
>
>> the flow, the second one shouldn't do it.
>>
>> If we wanted to bring that notion in Validation
>
> Management, that could mean that instead of
>
>> "yes/no"; we could have an error level and that when
>
> invoking a validator we could define
>
>> the level of the error to raise in case the validation
>
> fails.
>
>> The default behaviour could be to stop at the first error
>
> (as you've implied in the pipeline),
>
>> but an optional "config" input could be added that would
>
> allow to specify an error level. If the
>
>> error level is positive, no exception would be raised.
>>
>> Now, a validator might have several outputs. What about
>
> defining several outputs for a validator:
>
>> - the data output (useful only for schema languages
>
> that, like DTDs or WXS augment the infoset
>
>> with stuff such as default values).
>> - the report output with a yes/no (or level)
>
> information, error messages or (for Schematron) the
>
>> validation report.
>> - to that, we could add a PSVI output in the case of
>
> W3C XML Schema (assuming we had a
>
>> XML format for the PSVI).
>>
>> When a validator would be configured with a positive error
>
> level, error detection could be done
>
>> by checking the report output.
>>
>> All these answers are personal and should be checked with
>
> DSDL Working group.
>
>> o For simplicity, I assumed that the NVDL processor here would
>
> produce outputs with those
>
>> particular names. This would be possible only if the NVDL
>
> processor could be configured to
>
>> map those output names to namespaces. Practically, this
>
> processor could either:
>
>> o Have predefined output names, like document-1, document-2,
>
> etc.
>
>> o Produce a single XML document with all the streams aggregated
>>
>> I do not know NVDL well enough to see what would be natural
>
> here.
>
>> (v) None of them are natural :-) ...
>>
>> Right now, NVDL is currently for validation only and
>
> takes care of invoking the different validators
>
>> to return a single "yes/no" answer.
>>
>> Using it to split a document like mentioned in that
>
> scenario is thus an extrapolation
>
>> of what does the current NVDL implementation.
>>
>> However, given the fact that NVDL splits documents
>
> according to their namespaces, I wonder
>
>> if the aggregating streams would be different enough
>
> from the original document ;-) ...
>
>> Thus, I am wondering if predefined names wouldn't be
>
> the best solution. Maybe, instead of using
>
>> document-i, we could map namespaces URIs on names
>
> (like they are maped to namespaces prefixes).
>
>> o I have used a single validation processor that supports W3C XML
>
> Schema, Relax NG,
>
>> Schematron, and DTDs (here DTDs would either have to be
>
> encaspulated into a root element,
>
>> or referred externally). You could of course propose one
>
> processor per schema type. The
>
>> PresentationServer validation processor currently supports
>
> transparently W3C Schema and
>
>> Relax NG.
>>
>> o I proposed using XSLT to recombine the final document in the
>
> end.
>
>> o Otherwise, the pipeline is very simple. Nothing is against
>
> parallel execution on XPL.
>
>> Without exception support, the processing would just stop if
>
> there is a validation error.
>
>> With exception support, it could resume, locally (per branch) if
>
> needed, or just propose a
>
>> global fallback. Everything that is possible with exceptions.
>>
>> -->
>>
>> <!--
>> 1. Use NVDL to split out the parts of the document that are
>
> encoded using HTML, SVG and
>
>> MathML from the bulk of the document, whose tags are defined using
>
> a user-defined set of
>
>> markup tags.
>> -->
>> <p:processor name="oxf:nvdl">
>> <p:input name="document" infosetref="#source-document"/>
>> <p:input name="rules">
>> <rules>
>> NVDL rules
>> </rules>
>> </p:input>
>> <p:output name="html-stream" id="html-stream"/>
>> <p:output name="svg-stream" id="html-stream"/>
>> <p:output name="mathml-stream" id="html-stream"/>
>> <!--
>>
>> (v) typo: the ids should be "svg-stream" &amp; "mathml-stream"...
>>
>> -->
>> <p:output name="other-stream" id="other-stream"/>
>> </p:processor>
>>
>> <!--
>> 2. Validate the HTML elements and attributes using the HTML 4.0
>
> DTD (W3C XML DTD).
>
>> -->
>> <p:processor name="oxf:validation">
>> <p:input name="data" infosetref="#html-stream"/>
>> <p:input name="schema">
>> <!-- Reference to DTD for HTML -->
>> <dtd href="..."/>
>> </p:input>
>> <p:output name="data" id="html-stream-validated"/>
>> </p:processor>
>>
>> <!--
>> 3. Use a set of Schematron rules stored in check-metadata.xml to
>
> ensure that the metadata
>
>> of the HTML elements defined using Dublin Core semantics conform
>
> to the information in the
>
>> document about the document's title and subtitle, author, encoding
>
> type, etc.
>
>> -->
>> <p:processor name="oxf:validation">
>> <p:input name="data" infosetref="#html-stream-validated"/>
>> <!-- Reference to Schematron schema for HTML metadata -->
>> <p:input name="schema" infosetref="check-metadata.xml"/>
>> <p:output name="data" id="html-stream-schematronized"/>
>> <!--
>>
>> (v) Note that in the case of Schematron, the data output is
>
> identical to the data input.
>
>> -->
>> </p:processor>
>>
>> <!--
>> 4. Validate the SVG components of the file using the standard W3C
>
> schema provided in the
>
>> SVG 1.2 specification.
>> -->
>> <p:processor name="oxf:validation">
>> <p:input name="data" infosetref="#svg-stream"/>
>> <!-- Reference to W3C Schema for SVG -->
>> <p:input name="schema" infosetref="svg-1.2.xsd"/>
>> <p:output name="data" id="svg-stream-validated"/>
>> </p:processor>
>>
>> <!--
>> 5. Use the Schematron rules defined in SVG-subset.xml to ensure
>
> that the SVG file only uses
>
>> those features of SVG that are valid for the particular SVG viewer
>
> available to the system.
>
>> -->
>> <p:processor name="oxf:validation">
>> <p:input name="data" infosetref="#svg-stream-validated"/>
>> <!-- Reference to Schematron schema for SVG subset -->
>> <p:input name="schema" infosetref="SVG-subset.xml"/>
>> <p:output name="data" id="svg-stream-schmatronized"/>
>> </p:processor>
>>
>> <!--
>> 6. Validate the MathML components using the latest version of the
>
> MathML. schema (defined
>
>> in RELAX-NG) to ensure that all maths fragments are valid. The
>
> schema will make use the
>
>> datatype definitions in check-maths.xml to validate the contents
>
> of specific elements.
>
>> -->
>> <p:processor name="oxf:validation">
>> <p:input name="data" infosetref="#mathml-stream"/>
>> <!-- Reference to Relax NG shema for MathML -->
>> <p:input name="schema" infosetref="mathml-1.0.rng"/>
>> <p:output name="data" id="mathml-stream-validated"/>
>> </p:processor>
>>
>> <!--
>> 7. Use MathML-SVG.xslt to transform the MathML segments to
>
> displayable SVG and replace each
>
>> MathML fragment with its SVG equivalent.
>> -->
>> <p:processor name="oxf:xslt">
>> <p:input name="data" infosetref="#mathml-stream-validated"/>
>> <p:input name="config" infosetref="MathML-SVG.xslt"/>
>> <p:output name="data" id="mathml-as-svg"/>
>> </p:processor>
>>
>> <!--
>> 8. Use the DSRL definitions in convert-mynames.xml to convert the
>
> tags in the local nameset
>
>> to the form that can be used to validate the remaining part of the
>
> document using
>
>> docbook.dtd.
>> -->
>> <p:processor name="oxf:dsrl">
>> <p:input name="data" infosetref="#other-stream"/>
>> <p:input name="config" infosetref="convert-mynames.xml "/>
>> <p:output name="data" id="docbook-stream"/>
>> </p:processor>
>>
>> <p:processor name="oxf:validation">
>> <p:input name="data" infosetref="#docbook-stream"/>
>> <!-- Reference to DTD Docbook -->
>> <p:input name="schema">
>> <dtd href="..."/><!-- Reference to W3C DTD -->
>> </p:input>
>> <p:output name="data" id="docbook-stream-validated"/>
>> </p:processor>
>>
>> <!--
>> 9. Use the CRDL rules defined in mycharacter-checks.xml to
>
> validate that the correct
>
>> character sets have been used for text identified as being Greek
>
> and Cyrillic.
>
>> -->
>> <p:processor name="oxf:crdl">
>> <p:input name="data" infosetref="#docbook-stream-validated"/>
>> <p:input name="config" infosetref="mycharacter-checks.xml "/>
>> <p:output name="data" id="docbook-stream-validated-2"/>
>> </p:processor>
>>
>> <!--
>> 10. Convert the Docbook tags to HTML so that they can be displayed
>
> in a web browser using
>
>> the docbook-html.xslt transformation rules.
>> -->
>> <p:processor name="oxf:xslt">
>> <p:input name="data" infosetref="#docbook-stream-validated-2"/>
>> <p:input name="config" infosetref="docbook-html.xslt"/>
>> <p:output name="data" id="docbook-as-html"/>
>> </p:processor>
>>
>> <!--
>> After completion of step 10 the HTML (both streams), and SVG (both
>
> streams) should be
>
>> recombined to produce a single stream that can fed to a web
>
> browser.
>
>> -->
>> <p:processor name="oxf:xslt">
>> <p:input name="data" infosetref="#html-stream-schematronized"/>
>> <p:input name="html-2" infosetref="#docbook-as-html"/>
>> <p:input name="svg-1" infosetref="#svg-stream-schmatronized"/>
>> <p:input name="svg-2" infosetref="#mathml-as-svg"/>
>> <p:input name="config"
>
> infosetref="stylesheet-to-aggregate-everything.xsl"/>
>
>> <p:output name="data" infoset="result-document"/>
>> </p:processor>
>>
>></p:pipeline>
>>
>
>

--
DSDL members discussion list
To unsubscribe, please send a message with the
command  "unsubscribe" to dsdl-discuss-request@dsdl.org
(mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)
Received on Thu Nov 3 14:26:34 2005

This archive was generated by hypermail 2.1.8 : Thu Nov 03 2005 - 16:03:01 UTC