On Wed, 20 Aug 1997, Jacob Palme wrote:
> Here is a new draft text. It is based on the text proposed
> by Larry M and also tries to take into account the comments
> by Martin J D.
Many thanks for your proposal.
> Some changes in other places of RFC 2110 may also be
> necessary, I will check this when I update the draft.
> Handling of URLs containing inappropriate characters
> Some URLs may contain characters that are inappropriate
> for an RFC 822 header, either because the URL itself
> has an incorrect syntax or the URL syntax has changed to
> allow characters not allowed in mail headers. To include
> such a URL in a mail header, an implementation can either
> (a) arrange so that the URL becomes correctly formatted or
> (b) encode the header using the encoding method described
> in RFC 2047.
> Method (a) MUST be applied to the URL both in Content-
> Location headers and in body text. It MUST NOT be reversed
> by receiving mailers before matching hyperlinks to body
> Method (b) can be applied only to the URL in Content-
> Location headers and MUST be reversed by receiving clients
> before comparing hyperlinks in body text to URLs in
> Content-Location headers.
The can here seems somewhat ambiguous. Does it mean: It can be
applied only to the URL in Content-Location headers, but it
can also be applied to both, or does it mean: It can be applied
only to headers, but not to URLs inside a document?
I guess it would be better to say "MUST not be applied to
URLs inside the HTML text".
Also, I guess it should not only be Content-Location, but also
> Method (a) is not always easy. It may include cooperation
> with the user and the software which produced the faulty
> URL. The encoding method of RFC 1738 can make a correct
> URL faulty if not done the right way. Changing the URL of
> documents already available on the Internet or an Intranet
> may invalidate existing links to this document. Changing
> the HTML body may invalidate message integrity checks.
Do we really want to propose a method that has that many
ifs and whens? Do we assume implementors will all get all
these things right? I seriously doubt it.
> If method (b) is used, the charset US-ASCII can be used,
> or, if the URL contains octets outside of the 7-bit range,
> "UKNOWN-8BIT" [RFC 1428] or "UTF8" may be appropriate.
As I have written, "UTF8" should be "UTF-8", and is in fact
> Note that for MHTML processing (matching of URLs in body
> text to URL in Content-Location headers) the choice of
> character need not be the "correct" set, it need only
> be a set which, after reversal of the encoding by the
> receiving mailer, returns the same octet string as before
> the encoding.
Please don't speak about "character set" when you mean character