[dsdl-discuss] Re: Concurrent (overlapping) structures

From: Alex Brown <alexb@griffinbrown.co.uk>
Date: Wed Jun 04 2003 - 09:00:45 UTC

Rick hi

> > Concurrent (overlapping) structures happen and need
> processing in many
> > (most?) publishing applications.
>
> I certainly don't agree "most" or "need": the fact that
> almost no documents are
> published using concurrent markup demonstrates that.

It demonstrates that publishers have found workarounds for capturing these
structures (or - more often - avoid marking them up). They still happen and
need processing...

Any book with a non-trivial back-of-book index probably has concurrent
structures for those index entries which reference spans of content. Some
OEB file formats require pagination information from paper products to be
part of the eProduct. In a worflow, revision information could be modelled
with concurrent structures. Of course, such things can be modelled with
milestone-type elements, PIs, or what have you, but it gets messy.

More complex documents (certain literary texts in scholarly editions, for
example) can have very rich concurrent structures. Clearly this is an area
that W3C technologies don't seem to be interested in addressing, and I'd
have these 'difficult' use cases would be a good area for DSDL to be
conspicuously capable.

> > Has anybody any idea about how DSDL might address validation of
> > concurrent/overlapping structures?
>
> Schematron can be used for many concurrent markup structures.
> Here is one way.
>
> Our document is XML. We have a primary structure, which is made
> with just normal synchronous elements. Lets give it this structure.
>
> <!ELEMENT play ( title, line+)>
> <!ELEMENT title ( #PCDATA )>
> <!ELEMENT line ( #PCDATA | part-start | part-end )*>
> <!ELEMENT part-start EMPTY>
> <!ELEMENT part-end EMPTY>
> <!ATTLIST part-start
> character CDATA #REQUIRED>
>
> So the markup might look like
> ...
> <line><part-start
> character="FlowerPot">Weeed<part-end><part-start
> character="Ben">Hello</line
> <line>Ben<part-end><part-start character="narator">And there
> they all were</line>
> ...
>
> Schematron then expresses the constraints:
>
> <rule context="line/part-end>
> <assert test="count(previous::line-start) =
> count(previous::line-end) +1">
> At all stages in the document, every line-end will
> have a matching line-start.
> </assert>
> </rule>
>
> <rule context="poem/line/line-start/last()">
> <assert test="count(following::line-end) = 1">
> The last line-start must be matched by a line-end.
> </assert>
> </rule>
>
> I expect there are more efficient ways to do this. Without
> checking it, maybe this
> would work in a single pass:
>
> <rule context="poem">
> <report test="line/*[self::line-start or self::line-end]
> [not(self::line-start[position() mod 2 = 0]
> or self::line-end[position() mod 2 = 1]])">
> The line starts should always be in even positions,
> paired with line-ends always in odd positions.
> </report>
> </rule>
>
> > With DSDL as it's currently envisaged, we seem to be
> producing something
> > that will be less sophisticated than SGML in this respect...

Yes, this is the sort of workaround that publishers have been forced to use.
It can work, but has a number of drawbacks I think, most of which stem from
the fact that one structure is unnecessarily privileged as the 'primary'
structure, so

1. The primary structure gets to have a grammar defined in RNG, the
concurrent structures don't get a grammar (though one can be implied through
Schematron/XPath rules).

2. The primary structure is represented by start and end tags; the others by
milestones - which seems like a purely technical imposition (i.e. bad
design)

3. In practice, the primary structures get to be processable as nodes
through standard APIs and query languages, the auxillilary structures are
difficult to process.

4. Ideally one wants to be able to mix in new models into a document
description easily (for example, adding revision information structures).
Using the milestone approach one needs to go back to the 'primary' grammar
and augment all the content models where the milestones might appear (SGML
inclusions, anyone?)

http://www.lmnl.org/ is interesting.
 
> From the POV of validating concurrent structures in the XML, ISO DSDL
> is much more advanced than SGML.

Not sure about that - SGML + CONCUR may have had its faults (and not been
much implemented). But in use it at least gave concurrent grammars that were
consistently expressed as markup. http://xml.coverpages.org/teichap31.html,
S31.1 has an interesting example ...

- Alex.

--
DSDL members discussion list
To unsubscribe, please send a message with the
command  "unsubscribe" to dsdl-discuss-request@dsdl.org
(mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)
Received on Wed Jun 4 11:00:54 2003

This archive was generated by hypermail 2.1.8 : Fri Dec 03 2004 - 14:00:27 UTC