[dsdl-discuss] Re: A revised draft for Part 7

From: Keld Jørn Simonsen <keld@rap.rap.dk>
Date: Tue Nov 20 2007 - 08:52:01 UTC

On Tue, Nov 20, 2007 at 01:30:33PM +1100, Rick Jelliffe wrote:
> On Sat, 2007-11-17 at 09:55 +0100, Keld Jørn Simonsen wrote:
>
> > I think we should be general an be able to describe all aspects of an
> > encoding, including the whole of C0, but only on a character level.
> > Conceptually this does not change anything.
>
> Again, I strongly urge that there should be no attempt to handle or
> support in any way control characters (C0 and C1) except for the current
> white-space related characters. Neither as code points in Unicode nor
> with their control semantics.

The problem is that the registries relevant, that I know of, that is the
IANA registry and the ISO 15897 registry, both have C0 and C1 in most of
their definitions of charsets.

And anyway I don't see any harm in allowing such characters in markup,
as long as we don't try to parse them.

Anyway, what would you do if a control character beyond the whitespaces
shows up in the markup? Treat the whole markup as invalid?

For C1, a number of coded character sets use this range, including many
Microsoft charsets, and ISO 10646.

> Furthermore, allowing control characters reduces spare code-point
> redundancy which is useful for wrong encoding detection. XML 1.1 adopted
> this approach when, even though it allows NCRs from C1 range, it does
> not allow C1 characters directly expressed.

What about CP1252 etc?

Best regards
keld

--
DSDL members discussion list
To unsubscribe, please send a message with the
command  "unsubscribe" to dsdl-discuss-request@dsdl.org
(mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)
Received on Tue Nov 20 20:08:31 2007

This archive was generated by hypermail 2.1.8 : Tue Nov 20 2007 - 22:53:02 UTC