[dsdl-discuss] Re: DSDL - Datatypes and Character Sets

From: MURATA Makoto <murata@hokkaido.email.ne.jp>
Date: Sat Oct 02 2004 - 14:01:19 UTC

Peter,

I am now the project editor for Part 7. Thanks for your input.

Martin wrote:
> I have a problem with your second item:
>
> 2. The syntax shall enable character repertoires to be restricted
> by:
>
> a. Minimum number of characters
>
> b. Maximum number of characters
>
> c. Characters not permitted.
>
> d. Character combinations not permitted
>
> e. Character combinations required
>
> f. Characters not permitted if other nominated character sequences
> are present/not present.
>
> g. Characters required if other nominated character sequences are
> present/not present
>
>
>
> a and b are already part of datatype specifications, and are not really
> relevant within character repertoire descriptions

I do not understand what Peter meant by a and b. Trying to restrict
the number of characters in documents?
 
> d and e should be, but currently are not, part of a datatype definition - I
> can't see how we can stop people using any combinations of declared
> characters unless you put them in exclusions as strings <exclude>"ab"
> "cd"</exclude> in Jeni's proposal

First, let's clarify the requirements. To keep it simple, I suppose that
a base character is never followed by more than one combining character.
Under this restriction, I can think of some scenarios.

1) We allow a base character X but do not allow it to be followed by a combining
  character Y.

2) We allow a base character X always.

3) We disallow a base character X except that it is followed by a combining
   character Y.

4) We disallow a base character X always.

5) We allow a combining character Y only when it follows a base charcter X.

6) We allow a combining character Y always.

My plan was to cover 2), 4), and 6) only. I think that this is good enough
for the first version of Part 7.

If we try to cover 1), 3), and 5) as well, our character repertoire definition
is not merely a set of Unicode code values but rather a set of Unicode
code SEQUENCES. Thus, implementations will become difficult. Moreover, we
have to consider sequences of more than one combining character. Thus,
I am inclined to postpone 1), 3), and 5) for the second version of Part 7.

> f and g are again part of a regular expression rather than a repetoire
> description, I think. I suspect we need to be able to utilize Schematron to
> test such rules so it's up to how Schematron can be used to test character
> repetoire roles.

I also think f and g are outside the scope of Part 7.

Cheers,

-- 
MURATA Makoto <murata@hokkaido.email.ne.jp>
--
DSDL members discussion list
To unsubscribe, please send a message with the
command  "unsubscribe" to dsdl-discuss-request@dsdl.org
(mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)
Received on Sat Oct 2 16:01:49 2004

This archive was generated by hypermail 2.1.8 : Fri Dec 03 2004 - 14:00:28 UTC