Hi Martin,
> 1) The use of the asterisk after preprocess in the definition of
> parse seems to conflict with the definition of preprocess, where you
> specifically state there is only one built-in form of preprocessing,
> for whitespace processing. It would be better expressed as
> preprocess?
You're right, there should also be:
preprocess |= extension-preprocessing-attribute
extension-preprocessing-attribute = extension-attribute
and some text describing that extension preprocessing is allowed, and
such attributes may be ignored if they're not recognised.
> 2) The use of ? and [ is ambiguous in the example regex for dates.
> The -? entry in (?[year]-?[0-9]{4}) has not been noted as indicating
> optionality. It would be better if the first ? were replaced by an
> alternative character not used elsewhere in regular expressions,
> such as !, # or @. I would also suggest removal of the brackets,
> using something of the form (@year@-?[0-9]{4}) or (#year#-?[0-9]{4})
> instead and relying on a \@ type construct to identify the selected
> character if it is needed at the start of the matched string.
Using a ? after an opening bracket is the standard way of indicating
an extension to normal regular expression syntax (there's no ambiguity
since a ? appearing after an unescaped opening bracket *cannot*
indicate optionality). See, for example
http://docs.python.org/lib/re-syntax.html which states:
(?...)
This is an extension notation (a "?" following a "(" is not
meaningful otherwise). The first character after the "?"
determines what the meaning and further syntax of the construct
is. Extensions usually do not create a new group; (?P<name>...) is
the only exception to this rule. Following are the currently
supported extensions.
and http://perlcity.com/perl-regular-expressions.html which says:
Perl also defines a consistent extension syntax for features not
found in standard tools like awk and lex. The syntax is a pair of
parentheses with a question mark as the first thing within the
parentheses. The character after the question mark indicates the
extension.
I think it would be an extremely bad idea to introduce something that
didn't use this basic extension mechanism.
The normal syntax for naming subexpressions in Python and PHP is:
(?P<name>pattern)
And I've certainly seen:
(?<name>pattern)
used elsewhere.
I didn't want to use <>s within the regular expression syntax in DTLL
because these characters are significant in XML, and would therefore
have to be escaped.
An alternative that I've also seen is:
(?$name:pattern)
but this might mislead people into thinking that the named subpatterns
are actually assigned directly to variables, which they aren't, and
the : might be confusing given that XML names use them for namespaces.
Any other suggestions?
> 3) The acronyms EBNF and PEGS used in the Extension Parsing Elements
> need definition.
OK, I will add such.
> 4) When describing properties it is not clear why you use
> $colour/red, $colour/green, etc rather than $colour.red, etc. In the
> preceding paragraph you mention $this.name, but then give an example
> that does not use this construct, without a) explaining why or b)
> explaining the notation used in the example.
I take your point; I'll add something that shows the properties in
use.
By the way, the reason you use $colour/red etc. is that the <parse>
element assigns a value to the $colour variable. That variable holds a
tree that looks like:
(root)
+- red
| +- ...
+- blue
| +- ...
+- green
+- ...
Doing $colour/red gets the <red> element child of the root node held
by the $colour variable.
> 5) In the last sentence of the Type Specifiers section I would
> suggest that "this value may be" should read "this value should be"
> or "this value must be". (Under what conditions would the
> optionality of "may" be required?)
Would changing it to "this value can be" help? It neither should nor
must be a standard XPath type: it can be a standard XPath type or a
value of one of the datatypes defined by the library.
> 6) The example for ISODate is incorrect in the regex as it shows the
> use of / as separators, whereas ISO dates should use hyphens, as is
> indicated in the map.
Yep, thank you.
> 7) The exact role of the as attribute in maps could do with a
> clearer explanation. (Is it a compulsory instruction to "treat x as
> if it were y"?)
I'll see what I can do. If you have:
<map from="A" to="B" as="C" />
it means that in order to convert a value from datatype A to datatype
B you convert it via datatype C: in other words, you convert the value
from datatype A to datatype C, and then from datatype C to datatype B.
> Apart from these minor quibbles this text looks as if it has reached
> a stage where we should be submitting this to ISO for a CD ballot
> among member countries as Part 5 of DSDL. Would you have any
> objections to my proposing this at the next ISO meeting, scheduled
> for XML 2005?
I don't know what CD actually means in terms of the standardisation
process.
I've made the modifications in-place since none of them are major.
Cheers,
Jeni
--- Jeni Tennison http://www.jenitennison.com/ -- DSDL members discussion list To unsubscribe, please send a message with the command "unsubscribe" to dsdl-discuss-request@dsdl.org (mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)Received on Mon Aug 8 13:45:50 2005
This archive was generated by hypermail 2.1.8 : Fri Aug 19 2005 - 17:43:01 UTC