I don't think standards should say (normatively) how senders should figure out what to send; standards should say what the message _means_.
If you want to give implementation advice for how a sender might create a valid message, it should be clearly labelled as just information and not part of the normative spec.
Handling of URLs containing inappropriate characters
Some URLs may contain characters that are inappropriate for an RFC 822 header, either because the URL itself has an incorrect syntax or the URL syntax has changed to allow characters not allowed in mail headers. To include such a URL in a mail header, an implementation might:
(a) Transform the URL, e.g., using the encoding method of [RFC URL SYNTAX], in both the HTML text and the mail header. This transformation must be done carefully, since the %xx hex encoding cannot be applied directly to the URL but rather must be applied to its component parts.
Note that transforming HTML text has several difficulties, including the fact that message integrity checks are no longer valid.
(b) Use the encoding method for message headers described in RFC 2047, using either a charset value of "US-ASCII", or, if the URL contains octets outside of the 7-bit range, "UNKNOWN-8BIT" [RFC 1428], or "UTF8", as appropriate. Note that for MHTML processing (URL matching), the charset value is irrelevant, but it may be relevant for other operations on the URL.
Note that recipients of messages must reverse the encoding used in method (b) before matching URLs.
Larry -- http://www.parc.xerox.com/masinter
|