[dsdl-discuss] Re: Why not all of 11404? (was Re: Datatypes strawman)

From: Martin Bryan <mtbryan@sgml.u-net.com>
Date: Wed Jul 17 2002 - 08:36:48 UTC

Rick wrote

>What about just adopting the 11404 datatypes holus bolus? It gives us
the great advantage of staying within the ISO family, and we can claim
to be "fitting in" to programming languages.

I'm not personally convinced by 11404. I don't consider things like
complex-type and enumerated-types as primitive datatypes. I am also not
certain whether I want separate types for rational, integer and real (why
are separate types for integer and ordinal deemed relevant?) Given your
earlier arguments that we should restrict ourselves to those thing needed to
validate documents what would be your justification for adopting state-type
and procedure-type when there is no event-type to accompany them?

>I believe XML Schemas/XQuery has a major problem, in that they
are claiming they need schemas and datatyping for "efficiency" especially
when implementing on top of RDBMS.

Surely 11404 is even more biased to RDBMS implementations, if the
record-type, set-type, array-type and table-type options are anything to go
by

>Yet XML Schemas provides no mechanisms for the following:
  1) are elements ordered or unordered?
  2) will fast access be needed from a child to a parent?

>If an element is ordered, then you need an extra field or two in which
to specify the order (or the predecessor/successor, or perhaps a
a floating ordering number).

But is the ordinal-type defined in 11404 sufficient for this? We really need
a tree-specific set of types, such as those used for XPath, so that you can
identify all ancestors, descendents and siblings, not just immediate ones.

>Similarly, if you need to support the ancestor or parent axis with any
efficiency, you need extra fields.

>If the XML Schema is just used to serialize a relational DB, and
the receiving end knows (somehow?) that order or ancestor
axes are not important, there is no problem. And if the receiving
end is not a DBMS but an XML system, it can merrily go ahead
and act as if order were important without concerning anyone
(the XML would have been generated sorted).

>But what about where the schema precedes the data, and where
we want to store the data in an XML database? Then there is
no way for the DBMS to know whether to create sibling order or
parent fields, and whether to maintain them. So with XML Schemas,
it seems that blind storage of arbitrary XML documents actually
cannot be efficient in RDBMS and many tree systems.

>(Please let me know if you see a flaw here.)

Storage does not take place until the XML Schema has been fully applied. You
store the DOM resulting from schema validation. In converting from the DOM
to the database schema the set of rules for walking the DOM are specific to
the storage method, not in any way related to the parsing/validation stage
that precedes it.

>Ordering of elements is something that the XML Schemas
WG decided belonged to RDF Schemas, so it has fallen
through the cracks. It will be interesting to see how the
Query WG handles this one.

I'm not sure what you mean here. XML Schemas have sequence and choice
elements that manage ordering of data elements. Are you referring to the
ability to have unordered bags of subclassed elements within RDF?

>What can we do in DSDL?

Anything we like :-)
We already have Interleave. Do we need anything more?

>1) Ignore it. Who cares about RDBMS efficiency! This is the view
 to which I tend, but I expect others have different interests.

If DSDL can be implemented without too much hassle on RDBMS systems it will
have a wider market, even for inefficient tools.

>2) Implement some version of RDF Schemas. I don't believe that
RDF (Schemas) is mature enough for us to use at the moment. Dave Becket
has been doing a wonderful job pulling it into shape however.

God forbid: lets keep away from the RDF minefield. We don't want to be hurt
when it explodes :-)

>3) It is interesting that the 11404 includes things like bag and set.

If RDF Schemas is not mature enough then the draft status of the untried
11404 is even more likely to raise concerns. Among the quesions I would need
to get answered (given that I have only quickly flicked through the text)
are: Is set ordered? Is it restricted to a single type? Can sets contain
bags, or vice versa?

>Why not say:

  1) Element values and attribute values can be validated as 11404 primitive
types.
  2) Elements with mixed or element content can be validated as 11404
      aggregate types
  3) Then we need to figure out whethe the other generated types fit in.

>This gives us a very strong ability to map from document to data
structures.

OK folks, Rick wants to use the full set of 11404 types, and nothing else,
while I prefer to have a restricted set of primitives with other types that
are specific to DSDL, including all the basic SGML-related types. We need
more feedback from the lurkers on the list before we can determine which way
to jump. Shout out soon, or you'll be forced to use 11404 or my simple
subset. You have been warned!!

Martin Bryan

--
DSDL members discussion list
To unsubscribe, please send a message with the
command  "unsubscribe" to dsdl-discuss-request@dsdl.org
(mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)
Received on Wed Jul 17 04:43:01 2002

This archive was generated by hypermail 2.1.8 : Fri Dec 03 2004 - 14:00:27 UTC