On Tue, Nov 20, 2007 at 01:30:33PM +1100, Rick Jelliffe wrote:
> On Sat, 2007-11-17 at 09:55 +0100, Keld Jørn Simonsen wrote:
>
> > I think we should be general an be able to describe all aspects of an
> > encoding, including the whole of C0, but only on a character level.
> > Conceptually this does not change anything.
>
> Again, I strongly urge that there should be no attempt to handle or
> support in any way control characters (C0 and C1) except for the current
> white-space related characters. Neither as code points in Unicode nor
> with their control semantics.
The problem is that the registries relevant, that I know of, that is the
IANA registry and the ISO 15897 registry, both have C0 and C1 in most of
their definitions of charsets.
And anyway I don't see any harm in allowing such characters in markup,
as long as we don't try to parse them.
Anyway, what would you do if a control character beyond the whitespaces
shows up in the markup? Treat the whole markup as invalid?
For C1, a number of coded character sets use this range, including many
Microsoft charsets, and ISO 10646.
> Furthermore, allowing control characters reduces spare code-point
> redundancy which is useful for wrong encoding detection. XML 1.1 adopted
> this approach when, even though it allows NCRs from C1 range, it does
> not allow C1 characters directly expressed.
What about CP1252 etc?
Best regards
keld
-- DSDL members discussion list To unsubscribe, please send a message with the command "unsubscribe" to dsdl-discuss-request@dsdl.org (mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)Received on Tue Nov 20 20:08:31 2007
This archive was generated by hypermail 2.1.8 : Tue Nov 20 2007 - 22:53:02 UTC