On Tue, Nov 20, 2007 at 05:15:34PM -0500, G. Ken Holman wrote:
> At 2007-11-20 09:52 +0100, Keld Jørn Simonsen wrote:
> >The problem is that the registries relevant, that I know of, that is the
> >IANA registry and the ISO 15897 registry, both have C0 and C1 in most of
> >their definitions of charsets.
> >
> >And anyway I don't see any harm in allowing such characters in markup,
> >as long as we don't try to parse them.
>
> Keld, XML 1.0 doesn't even allow them to be
> present either as C0 characters or escaped as numeric character references.
My understanding is that we are not discussiong XML.
XML is always Unicode. So we are talking about some support beyond XML
and Unicode.
Dependent on what the purpose then is for this support, we can allow
them or not.
> XML 1.1 doesn't allow them to be present but does
> allow all but NUL to be escaped as numeric character references.
>
> From day 1 of DSDL we've stated that DSDL must conform to XML.
OK, so no other charstes than Unicode (in various forms, or what? Only
UTF-16?)
>
> >Anyway, what would you do if a control character beyond the whitespaces
> >shows up in the markup? Treat the whole markup as invalid?
>
> XML 1.0 section 2.2 enumerates the allowed characters.
>
> XML 1.1 section 2.2 also does so, indicating which ones must be escaped.
So if we would like to follow this, we could then explicitely state that
this also is valid for the other charsets.
> >For C1, a number of coded character sets use this range, including many
> >Microsoft charsets, and ISO 10646.
> >
> >> Furthermore, allowing control characters reduces spare code-point
> >> redundancy which is useful for wrong encoding detection. XML 1.1 adopted
> >> this approach when, even though it allows NCRs from C1 range, it does
> >> not allow C1 characters directly expressed.
> >
> >What about CP1252 etc?
>
> Irrelevant.
>
> XML is based solely on Unicode. If an XML
> processor deigns to support the encoding=
> declaration in the XML declaration it is still
> obligated to deliver Unicode characters to its invoking application.
Why are we then even talking about other charsets than Unicode?
Anyway the C1 codespace (U0080-U009F) is also well defined for 10646
via normative reference to ISO 6429.
Best regards
keld
-- DSDL members discussion list To unsubscribe, please send a message with the command "unsubscribe" to dsdl-discuss-request@dsdl.org (mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)Received on Sat Nov 24 11:16:10 2007
This archive was generated by hypermail 2.1.8 : Sat Nov 24 2007 - 10:23:08 UTC