[dsdl-discuss] Re: Does Part 7 need built-in definitions for IANA sets?

From: MURATA Makoto <murata@hokkaido.email.ne.jp>
Date: Sat Nov 13 2004 - 02:17:50 UTC

> So Part 7 should have a way to define IANA character sets without
> requiring an explicit hull-and-kernel specification somewhere.

Unfortunately, IANA character sets are poorly defined. Until the IANA registry
is cleaned up, I do not think it is possible to use them as a basis
of Part 7.

> I.e. so that an implementation could
> 1) canonicalize the text
Are you talking about Unicode canonicalization?
> 2) round-trip it through its platforms transcoders for that character set
I do not understand. Please elaborate.
> 3) canonicalize it again, just to be sure
Again, Unicode canonicalization?
> 4) check that the input is the same as the output.

snip.

> 2) A reserved naming convention for IANA sets, so that implementations can
> build in the alternative implementation approach above. (At one stage Dan
> Connolly of W3C made up URIs for the IANA names, I don't recall what they
> were: something like http://www.iana.org/charset/US-ASCII )

I think it is practically impossible. For example, what does
Extended_UNIX_Code_Packed_Format_for_Japanese mean? It is unclear
which version of JIS X 0208 is used and which mapping table from
JIS X 0208 to Unicode is used.

At present, different implementations do different things.
If Part 7 is based on IANA charsets, we cannot have interoperability.

Cheers,

-- 
MURATA Makoto <murata@hokkaido.email.ne.jp>
--
DSDL members discussion list
To unsubscribe, please send a message with the
command  "unsubscribe" to dsdl-discuss-request@dsdl.org
(mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)
Received on Sat Nov 13 03:18:27 2004

This archive was generated by hypermail 2.1.8 : Fri Dec 03 2004 - 14:00:28 UTC