[dsdl-discuss] Re: Fw: DTLL 0.4

From: Jeni Tennison <jeni@jenitennison.com>
Date: Mon Aug 08 2005 - 11:45:33 UTC

Hi Martin,

> 1) The use of the asterisk after preprocess in the definition of
> parse seems to conflict with the definition of preprocess, where you
> specifically state there is only one built-in form of preprocessing,
> for whitespace processing. It would be better expressed as
> preprocess?

You're right, there should also be:

preprocess |= extension-preprocessing-attribute
extension-preprocessing-attribute = extension-attribute

and some text describing that extension preprocessing is allowed, and
such attributes may be ignored if they're not recognised.

> 2) The use of ? and [ is ambiguous in the example regex for dates.
> The -? entry in (?[year]-?[0-9]{4}) has not been noted as indicating
> optionality. It would be better if the first ? were replaced by an
> alternative character not used elsewhere in regular expressions,
> such as !, # or @. I would also suggest removal of the brackets,
> using something of the form (@year@-?[0-9]{4}) or (#year#-?[0-9]{4})
> instead and relying on a \@ type construct to identify the selected
> character if it is needed at the start of the matched string.

Using a ? after an opening bracket is the standard way of indicating
an extension to normal regular expression syntax (there's no ambiguity
since a ? appearing after an unescaped opening bracket *cannot*
indicate optionality). See, for example
http://docs.python.org/lib/re-syntax.html which states:

 (?...)
    This is an extension notation (a "?" following a "(" is not
    meaningful otherwise). The first character after the "?"
    determines what the meaning and further syntax of the construct
    is. Extensions usually do not create a new group; (?P<name>...) is
    the only exception to this rule. Following are the currently
    supported extensions.

and http://perlcity.com/perl-regular-expressions.html which says:

 Perl also defines a consistent extension syntax for features not
 found in standard tools like awk and lex. The syntax is a pair of
 parentheses with a question mark as the first thing within the
 parentheses. The character after the question mark indicates the
 extension.

I think it would be an extremely bad idea to introduce something that
didn't use this basic extension mechanism.
 
The normal syntax for naming subexpressions in Python and PHP is:

  (?P<name>pattern)

And I've certainly seen:

  (?<name>pattern)

used elsewhere.

I didn't want to use <>s within the regular expression syntax in DTLL
because these characters are significant in XML, and would therefore
have to be escaped.

An alternative that I've also seen is:

  (?$name:pattern)

but this might mislead people into thinking that the named subpatterns
are actually assigned directly to variables, which they aren't, and
the : might be confusing given that XML names use them for namespaces.

Any other suggestions?

> 3) The acronyms EBNF and PEGS used in the Extension Parsing Elements
> need definition.

OK, I will add such.

> 4) When describing properties it is not clear why you use
> $colour/red, $colour/green, etc rather than $colour.red, etc. In the
> preceding paragraph you mention $this.name, but then give an example
> that does not use this construct, without a) explaining why or b)
> explaining the notation used in the example.

I take your point; I'll add something that shows the properties in
use.

By the way, the reason you use $colour/red etc. is that the <parse>
element assigns a value to the $colour variable. That variable holds a
tree that looks like:

  (root)
    +- red
    | +- ...
    +- blue
    | +- ...
    +- green
        +- ...

Doing $colour/red gets the <red> element child of the root node held
by the $colour variable.

> 5) In the last sentence of the Type Specifiers section I would
> suggest that "this value may be" should read "this value should be"
> or "this value must be". (Under what conditions would the
> optionality of "may" be required?)

Would changing it to "this value can be" help? It neither should nor
must be a standard XPath type: it can be a standard XPath type or a
value of one of the datatypes defined by the library.

> 6) The example for ISODate is incorrect in the regex as it shows the
> use of / as separators, whereas ISO dates should use hyphens, as is
> indicated in the map.

Yep, thank you.

> 7) The exact role of the as attribute in maps could do with a
> clearer explanation. (Is it a compulsory instruction to "treat x as
> if it were y"?)

I'll see what I can do. If you have:

<map from="A" to="B" as="C" />

it means that in order to convert a value from datatype A to datatype
B you convert it via datatype C: in other words, you convert the value
from datatype A to datatype C, and then from datatype C to datatype B.

> Apart from these minor quibbles this text looks as if it has reached
> a stage where we should be submitting this to ISO for a CD ballot
> among member countries as Part 5 of DSDL. Would you have any
> objections to my proposing this at the next ISO meeting, scheduled
> for XML 2005?

I don't know what CD actually means in terms of the standardisation
process.

I've made the modifications in-place since none of them are major.

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/
--
DSDL members discussion list
To unsubscribe, please send a message with the
command  "unsubscribe" to dsdl-discuss-request@dsdl.org
(mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)
Received on Mon Aug 8 13:45:50 2005

This archive was generated by hypermail 2.1.8 : Fri Aug 19 2005 - 17:43:01 UTC