[dsdl-discuss] Re: A revised draft for Part 7

From: Rick Jelliffe <rjelliffe@allette.com.au>
Date: Tue Nov 20 2007 - 21:44:26 UTC

On Tue, 2007-11-20 at 09:52 +0100, Keld Jørn Simonsen wrote:

> And anyway I don't see any harm in allowing such characters in markup,
> as long as we don't try to parse them.

A character in a wrong encoding is data corruption. There is no greater
harm for documents.

> Anyway, what would you do if a control character beyond the whitespaces
> shows up in the markup? Treat the whole markup as invalid?

Yes. That is what XML does, and what the W3C I18n WG and the Unicode
consortium and the W3C XML WG does. It is based on an approach agreed on
by members of SC34 WG1 (me, James, etc).

> For C1, a number of coded character sets use this range, including many
> Microsoft charsets, and ISO 10646.

This is exactly why they must be barred. XML insists on true labelling
of characters with their encoding. A document labelled ISO 8859-1 that
has some of the MS 1252 characters is not a well-formed XML document.
This is basics for XML, finalized 10 years ago, why are we even
discussing it?

Cheers
Rick Jelliffe

--
DSDL members discussion list
To unsubscribe, please send a message with the
command  "unsubscribe" to dsdl-discuss-request@dsdl.org
(mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)
Received on Tue Nov 20 22:39:15 2007

This archive was generated by hypermail 2.1.8 : Tue Nov 20 2007 - 22:53:02 UTC