[dsdl-discuss] Re: FW: DSDL: Physical validation

From: Rick Jelliffe <ricko@allette.com.au>
Date: Wed Sep 29 2004 - 01:39:13 UTC

Obviously these go beyond the XML Infoset. None of the requirements
can be satisifed by XPath, for example.

SAX is not be able to handle some of them either:
 * SAX does not provide any way of knowing whether an empty
   tag was used
 * when numeric character references are used

SAX 2 (with extensions) does let you know
 * when namespace declarations come into effect
 * when entities are used
 * when CDATA sections start and end

SAX either process in a namespace-aware mode (no prefixes)
or a non-namespace-aware mode; also, different parsers
act differently on what name information they report.
So probably prefix validation may require a separate pass
from any namespace-aware validation.

I strongly recommend, if people are interested in persuing this,
that only the information provided by SAX 2 with its extensions
can be considered. Anything else requires a custom parser, which
would be crazy.

One method of implementing this would be to simply define
a SAX stream markup language, presumably stripping out data content
for efficiency, and then using our existing validation languages
on top of that. That would be this simplest to implement and
specify.

I believe that all issues related to validating that an
XML document can be browser by HTML browsers should be
out-of-scope for ISO DSDL. That is a whole different can
of worms that requires a specific tool, not a standard.

XSLT 2 has new features for ensuring that certain characters
are marked up using references: there will be less need for
validation.

Another possible use case for this kind of validation language
would be to check that XML matches some Canonical XML.

There is currently a tool available to test whether a
file may use hexadecimal numeric character references.
   grep \&#x filename
If we rule out writing custom parsers (as I believe we must,
both because it is impractical to expect developers to
write full parsers for niche requirements and because one
of our agreed cornerstones from the beginning was that we
were limiting ourselves to the SGML informationt that was
available in XML) then anything we do will just be duplicating
grep more or less. No advantage.

Cheers
Rick Jelliffe

Cheers
Rick Jelliffe

--
DSDL members discussion list
To unsubscribe, please send a message with the
command  "unsubscribe" to dsdl-discuss-request@dsdl.org
(mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)
Received on Wed Sep 29 03:35:39 2004

This archive was generated by hypermail 2.1.8 : Fri Dec 03 2004 - 14:00:28 UTC