[dsdl-discuss] Why not all of 11404? (was Re: Datatypes strawman)

From: Rick Jelliffe <ricko@topologi.com>
Date: Wed Jul 17 2002 - 06:29:58 UTC

From: "Martin Bryan" <mtbryan@sgml.u-net.com>

> A detailed proposal for Part 5, Datatypes for document content validation,
> has been posted at http://www.sgml.u-net.com/DSDL-Datatypes-v03.htm

What about just adopting the 11404 datatypes holus bolus? It gives us
the great advantage of staying within the ISO family, and we can claim
to be "fitting in" to programming languages.

I believe XML Schemas/XQuery has a major problem, in that they
are claiming they need schemas and datatyping for "efficiency" especially
when implementing on top of RDBMS.

Yet XML Schemas provides no mechanisms for the following:
  1) are elements ordered or unordered?
  2) will fast access be needed from a child to a parent?

If an element is ordered, then you need an extra field or two in which
to specify the order (or the predecessor/successor, or perhaps a
a floating ordering number).

Similarly, if you need to support the ancestor or parent axis with any
efficiency, you need extra fields.

If the XML Schema is just used to serialize a relational DB, and
the receiving end knows (somehow?) that order or ancestor
axes are not important, there is no problem. And if the receiving
end is not a DBMS but an XML system, it can merrily go ahead
and act as if order were important without concerning anyone
(the XML would have been generated sorted).

But what about where the schema precedes the data, and where
we want to store the data in an XML database? Then there is
no way for the DBMS to know whether to create sibling order or
parent fields, and whether to maintain them. So with XML Schemas,
it seems that blind storage of arbitrary XML documents actually
cannot be efficient in RDBMS and many tree systems.

(Please let me know if you see a flaw here.)

Ordering of elements is something that the XML Schemas
WG decided belonged to RDF Schemas, so it has fallen
through the cracks. It will be interesting to see how the
Query WG handles this one.

What can we do in DSDL?

1) Ignore it. Who cares about RDBMS efficiency! This is the view
 to which I tend, but I expect others have different interests.

2) Implement some version of RDF Schemas. I don't believe that
RDF (Schemas) is mature enough for us to use at the moment. Dave Becket
has been doing a wonderful job pulling it into shape however.

3) It is interesting that the 11404 includes things like bag and set.

Why not say:

  1) Element values and attribute values can be validated as 11404 primitive types.
  2) Elements with mixed or element content can be validated as 11404
      aggregate types
  3) Then we need to figure out whethe the other generated types fit in.

This gives us a very strong ability to map from document to data structures.

Cheers
Rick

--
DSDL members discussion list
To unsubscribe, please send a message with the
command  "unsubscribe" to dsdl-discuss-request@dsdl.org
(mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)
Received on Wed Jul 17 02:15:42 2002

This archive was generated by hypermail 2.1.8 : Fri Dec 03 2004 - 14:00:27 UTC