Content-Type: text/html At 09.13 -0700 97-08-22, Larry Masinter wrote: > 1738 is being updated. In the real world, people are using all kinds of > characters as "reserved", and the truth is you're really taking a risk > if you encode something on your own that wasn't encoded. > > In the interest of safety, I think you're better off *not* recommending > using %xx encoding as a way of making illegal URLs safer. But this > is still just implementation advice. > > > I assume this means that any other character, if occuring in the value > > submitted to a mailer for a Content-Location, must be encoded either > > using the RFC 1738 encoding method or the RFC 2047 encoding method. > > It is misleading to talk about "encoding a character using the > RFC 1738 encoding method", because the RFC 1738 encoding method > is not a character-by-character encoding. That is, you have to > look at the whole URL and the scheme and the context of the > character. RFC 2047 encoding, on the other hand, can be decided > character-by-character, because it is at a different layer. Here is a new draft text, based on your suggestions; exclamation marks in the border marks changes to the previous draft text: Handling of URLs containing inappropriate characters Some URLs may contain characters that are inappropriate for an RFC 822 header, either because the URL itself has an incorrect syntax or the URL syntax has changed to allow characters not allowed in mail headers. To include such a URL in a mail header, an implementation can either (a) arrange so that the URL becomes correctly formatted or (b) encode the header using the encoding method described in RFC 2047. Method (a) MUST be applied to the URL both in Content- Location headers and in body text. It MUST NOT be reversed by receiving mailers before matching hyperlinks to body parts. Method (b) MUST not be applied to the URL in the HTML text and MUST be reversed by receiving clients before comparing hyperlinks in body text to URLs in Content-Location headers. Method (a) is not always easy. It may include cooperation with the user and the software which produced the faulty URL. The encoding method of RFC 1738 can make a correct URL faulty if not done the right way. Changing the URL of documents already available on the Internet or an Intranet may invalidate existing links to this document. Changing the HTML body may ! invalidate message integrity checks. For these reasons, this ! standards recommends method (b). ! With method (b), the charset US-ASCII can be used, or, if the URL contains octets outside of the 7-bit range, "UKNOWN-8BIT" [RFC 1428] or "UTF-8" may be appropriate. Note that for MHTML processing (matching of URLs in body text to URL in Content- Location headers) the choice of character encoding need not be the "correct" choice, it need only be a choice which, after reversal of the encoding by the receiving mailer, returns the same octet string as before the encoding. ------------------------------------------------------------------------ Jacob Palme <[log in to unmask]> (Stockholm University and KTH) for more info see URL: http://www.dsv.su.se/~jpalme