At 14.30 -0700 97-08-24, Einar Stefferud wrote:
> Now, I have a new question -- Does this text belong in the standard or
> in the Information RFC? Does this go into the Standard as "How to be
> liberal in handling what is received"? It seems to be talking about
> how to prepare HTML for sending.
It belongs to the standard. This is advice which has to be adhered
to, otherwise the standard will not work in the cases we are discussion.
Part of the text might be moved to the informational document, but
it is better to keep everything in the same place, readability will
be reduced if we split it, especially since references from the
standard to the info document are not allowed.
At 00.26 -0700 97-08-25, Larry Masinter wrote:
> OK, here's my cut.
We are moving in circles. Your text is very good, but it does not
say that we primarily want to recommend method (b), which some
other people say that we should.
Here is a new draft of the text. The text below is based on Larry's
latest text, with the following changes:
- Separate standards text (MUST, MUST NOT, SHOULD) from recommendations.
- Certain changes of wording to clarify what is standards text (MUST,
MUST NOT, SHOULD) and what is advice/recommendations.
- Removed one duplicate of the same thing said twice.
--- --- --- new proposed text --- --- ---
Handling of URLs containing inappropriate characters
Some documents may contain URLs with characters that are
inappropriate for an RFC 822 header, either because the URL
itself has an incorrect syntax or the URL syntax standard has
been changed to allow characters not previously allowed in
MIME headers. These URLs cannot be sent directly in a mail
header. There are two approaches that can be taken when
encountering such a URL as the text to be placed in a Content-
Location or Content-Base header:
a) In some situations, an implementation might be able to
replace the URL with one that can be sent directly. This might
be accomplished, for example, by using the encoding method of
RFC 1738 to replace inappropriate characters within the URL
with ones encoded using the %nn encoding. This replacement
MUST in that case be done both in the header and in the HTML
text which has a hyperlink which is to match the header. Since
the change is done in both places, a receiving mailer need not
decode it, and MUST NOT decode RFC 1748-encoding before
matching hyperlinks to body parts.
b) The URL might be encoded using the method described in RFC
2047. This replacement MUST only be done in the header, not in
the HTML text. Receiving clients must decode the RFC 2047
encoding before comparing hyperlinks in body text to URLs in
With method (b), the charset parameter value "US-ASCII" can be
used, or, if the URL contains octets outside of the 7-bit
range, "UKNOWN-8BIT" [RFC 1428] or "UTF-8" may be appropriate.
Note that for the MHTML processing of (matching URLs in body
text to URL in) Content-Location headers the choice of
character encoding need not be the "correct" choice. It need
only be a choice which, after reversal of the encoding by the
receiving mailer, returns the same octet string as before the
Caution should, however, be taken in using method (a), since,
in general, this encoding can not be applied safely to
characters that are used for reserved purposes within the URL
scheme. In addition, changing the HTML body which contains the
URL might invalidate a message integrity check. Because of
these problems, this method SHOULD only be used if it is
performed in cooperation with the author/owner of the
Jacob Palme <[log in to unmask]> (Stockholm University and KTH)
for more info see URL: http://www.dsv.su.se/~jpalme