[dsdl-discuss] Re: Response to rest of Martin's comments

From: Rick Jelliffe <ricko@allette.com.au>
Date: Fri Apr 16 2004 - 05:14:48 UTC

Thanks for the responses!

I guess one approach would be to declare 3 bindings:
 * minimal codes+collections (Martin's)
 * W3C character classes (Rick's)
 * POSIX character classes (Keld's)
and let the market decide what to support.

Probably discussion of the issue of which binding, if any, should be
used should wait
until the WG decides whether it wants to adopt Part 7 using the Part 3
framework.

>ISO 14651 has syntax useful for regular expressions, and ISO TR 14652
> has syntax for character properties.

I could not see any syntax in ISO 14651[1] that looks like familiar
POSIX/Perl/XSD
regular expressions. And the last thing we need, for acceptability and
credibility, is
another syntax.

Similarly, TR 14652 operates in terms of specifying individual characters
one at a time (unfeasible for users) and I could not see anything about
character properties
it the PDTR. In fact, SC 34 should avoid using any spec that uses its own
names for characters: we have numeric character references and entity
references
for this issue and we have no ASCII-only limitation.

I wouldn't have thought that there is any requirement to limit character
repertoires
by collation-order per se. We don't need to provide anything that RELAX
NG/WXS
datatyping would provide. The user is a programmer involved in
industrial publishing
who needs to test schemas quickly to meet a deadline, and the schema
language must
be faster to use than writing a small utility using the APIs at hand
(SAX, XML
Schemas, etc.).

Cheers
Rick

[1] http://anubis.dkuug.dk/jtc1/sc22/open/n2933.pdf
[2] anubis.dkuug.dk/jtc1/sc22/WG20/docs/n690.pdf

Keld Jørn Simonsen wrote:

>On Tue, Apr 13, 2004 at 08:03:15PM +1000, Rick Jelliffe wrote:
>
>
>>Martin Bryan wrote:
>>
>>
>>>My initial (unofficial comments) on Rick's draft are:
>>>
>>>
>>>Reference should be added to ISO 10646 as default source of character set
>>>naming conventions
>>>
>>>
>>ISO 10646 names blocks and characters, but it does not name properties
>>of characters.[1] These come from the Unicode Consortium, which provides
>>the semantic layer on top ISO 10646.
>>
>>I believe that ISO 10646 uses "collections" for its unit. I think
>>ISO 10646 is only formally relevant, what we are interested in
>>is Unicode's properties in general.
>>
>>
>
>I think we should refer to ISO standards where possible.
>
>ISO does have standards also for character properties, and sorting, eg
>ISO 14651 and ISO TR 14652. These are built on ISO 10646.
>
>
>
>>If ISO 10646 is not really appropriate, what about something
>>from the Unicode Constortium? Well, their spec on Regular Expressions
>>and character classes[3] do have a syntax, and indeed that syntax
>>is based on Perl's as is XML Schemas, but it is only a demonstrative
>>syntax not one intended to have normative properties. So Unicode
>>is also not directly useful as a normative reference for
>>defining the language used for assertion tests.
>>
>>
>
>ISO 14651 has syntax useful for regular expressions, and ISO TR 14652
>has syntax for character properties.
>
>
>
>>In the W3C WG on XML Schemas, we took that REGEX document in
>>consideration when deciding on the design of the W3C regular
>>expressions. (Please note that the version of the Unicode REGEX
>>would be one from 1999, not the most recent version!)
>>
>>
>
>Yes, in SC34 it would probably be more useful to have something that is
>built on a SGML syntax. The 14651/14652 syntax is built on POSIX work.
>SC22/WG20 is considering making a syntax built on SGML/XML.
>
>
>
>>If we are putting authority in, surely we should make them URLs.
>>Does ISO have any mechanism for persistant URLs? (In fact, I was
>>trying to avoid the complexity of long names and the URL/PURL/Public
>>identifier controversy by keeping the names as short as possible.)
>>
>>
>
>We can make our own URLs in SC34. I have some disappointment about the
>stability of ISO URLs, as they change from time to time without
>apparrant cause, and without notice. One specific example was the
>freely available standard TR 15285 character glyph model, which was
>developed by SC2 and SC18, the reference I had from the SC2/WG2
>pages did not work anymore because the renamed the file, so now I
>copied that file and made it available from the SC2/WG2 server directly.
>
>
>
>
>>I don't see it as a big issue, but part of the benefit of using
>>xslt-charrep is to for the default is that it clearly suggest what
>>is going on. Assuming that no users would buy the spec, looking
>>at a raw schema and seeing things that look like XPaths and the
>>string "xslt" should give anyone with half a clue a pretty good idea.
>>
>>
>
>There is another possibility about having permanent URLs for charreps,
>namely the ISO 15897 cultural register, which already contains character
>description files, and has some guarantees to have stable URLs.
>The registry is available at http://www.dkuug.dk/cultreg - and I am the
>editor of 15897 and also head of the registration authority.
>15897 is prepared to host SGML conforming character description files.
>
>
>
>>>3) How user-defined character sets can be defined and named.
>>>
>>>
>
>There are naming guidelines for coded character set entities, and also
>character repertoires in ISO 15897. We also had an API standard
>under development in SC22/WG20 (15435) which was cancelled, due to lack
>of interest and progress. This had APIs for accessing charmaps and
>using user defined charmaps ond ISO registered charmaps and other
>things.
>
>Best regards
>keld
>
>

--
DSDL members discussion list
To unsubscribe, please send a message with the
command  "unsubscribe" to dsdl-discuss-request@dsdl.org
(mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)
Received on Fri Apr 16 07:15:23 2004

This archive was generated by hypermail 2.1.8 : Fri Dec 03 2004 - 14:00:28 UTC