On Tue, 19 Aug 1997, Larry Masinter wrote:
Many thanks to Larry for the careful wording.
I wonder whether we shouldn't exchange (a) and (b), because it seems that the current (b) is the method of choice.
[(a) is %HH encoding with its difficulties, (b) is RFC 2047 encoding: =?us-ascii?Q?.....]
> (b) Use the encoding method for message headers described > in RFC 2047, using either a charset value of "US-ASCII", > or, if the URL contains octets outside of the 7-bit range, > "UNKNOWN-8BIT" [RFC 1428], or "UTF8", as appropriate.
First, please note that it is "UTF-8". Many thanks to Larry for bringing in UTF8 for URLs here. I wish would already have all URLs (with such octets) in UTF-8, and could just write it that way. But UTF-8 URLs are not supposed to work by just labeling them as UTF-8. If an HTML document, encoded say in koi-8, contains 8-bit octets in an URL, then these octets are octets that have to be interpreted in koi-8, and not in UTF-8. HTML transports characters, and these may undergo transcoding, e.g. in proxies or by cut-and-paste. So an URL extracted from an HTML document labeled as KOI-8 should be by itself be labeled as KOI-8. Labeling as UTF-8 is only appropriate if the URL is indeed transcoded to UTF-8 (but then comparison between the URL inside the KOI-8 document and the URL encoded with UTF-8 will be quite difficult). So here the earlier text by Jacob should be used.
Regards, Martin.
|