> So Part 7 should have a way to define IANA character sets without
> requiring an explicit hull-and-kernel specification somewhere.
Unfortunately, IANA character sets are poorly defined. Until the IANA registry
is cleaned up, I do not think it is possible to use them as a basis
of Part 7.
> I.e. so that an implementation could
> 1) canonicalize the text
Are you talking about Unicode canonicalization?
> 2) round-trip it through its platforms transcoders for that character set
I do not understand. Please elaborate.
> 3) canonicalize it again, just to be sure
Again, Unicode canonicalization?
> 4) check that the input is the same as the output.
snip.
> 2) A reserved naming convention for IANA sets, so that implementations can
> build in the alternative implementation approach above. (At one stage Dan
> Connolly of W3C made up URIs for the IANA names, I don't recall what they
> were: something like http://www.iana.org/charset/US-ASCII )
I think it is practically impossible. For example, what does
Extended_UNIX_Code_Packed_Format_for_Japanese mean? It is unclear
which version of JIS X 0208 is used and which mapping table from
JIS X 0208 to Unicode is used.
At present, different implementations do different things.
If Part 7 is based on IANA charsets, we cannot have interoperability.
Cheers,
-- MURATA Makoto <murata@hokkaido.email.ne.jp> -- DSDL members discussion list To unsubscribe, please send a message with the command "unsubscribe" to dsdl-discuss-request@dsdl.org (mailto:dsdl-discuss-request@dsdl.org?Subject=unsubscribe)Received on Sat Nov 13 03:18:27 2004
This archive was generated by hypermail 2.1.8 : Fri Dec 03 2004 - 14:00:28 UTC