I don't think standards should say (normatively) how senders should
figure out what to send; standards should say what the message _means_.
If you want to give implementation advice for how a sender might
create a valid message, it should be clearly labelled as just
information and not part of the normative spec.
Handling of URLs containing inappropriate characters
Some URLs may contain characters that are inappropriate
for an RFC 822 header, either because the URL itself
has an incorrect syntax or the URL syntax has changed to
allow characters not allowed in mail headers. To include
such a URL in a mail header, an implementation might:
(a) Transform the URL, e.g., using the encoding method of
[RFC URL SYNTAX], in both the HTML text and the mail header.
This transformation must be done carefully, since
the %xx hex encoding cannot be applied directly to the
URL but rather must be applied to its component parts.
Note that transforming HTML text has several difficulties,
including the fact that message integrity checks are
no longer valid.
(b) Use the encoding method for message headers described
in RFC 2047, using either a charset value of "US-ASCII",
or, if the URL contains octets outside of the 7-bit range,
"UNKNOWN-8BIT" [RFC 1428], or "UTF8", as appropriate.
Note that for MHTML processing (URL matching), the charset
value is irrelevant, but it may be relevant for other operations
on the URL.
Note that recipients of messages must reverse the encoding
used in method (b) before matching URLs.
Larry
--
http://www.parc.xerox.com/masinter
|