LISTSERV mailing list manager LISTSERV 15.5

Help for MHTML Archives


MHTML Archives

MHTML Archives


View:

Next Message | Previous Message
Next in Topic | Previous in Topic
Next by Same Author | Previous by Same Author
Chronologically | Most Recent First
Proportional Font | Monospaced Font

Options:

Join or Leave MHTML
Reply | Post New Message
Search Archives


Subject: Re: More on wrongly(?) formatted urls
From: Martin J. Dürst <[log in to unmask]>
Reply-To:IETF working group on HTML in e-mail <[log in to unmask]>
Date:Mon, 25 Aug 1997 13:51:39 +0200
Content-Type:TEXT/PLAIN
Parts/Attachments:
Parts/Attachments

TEXT/PLAIN (96 lines)


Many thanks to Einar, Jacob, and Larry for advancing this issue over
the weekend.

Unfortunately, one point I mentionned earlier has not yet
been integrated into the new text. Currently, there is a
paragraph reading

>       With method (b), the charset parameter value "US-ASCII" can be
>       used, or, if the URL contains octets outside of the 7-bit
>       range, "UKNOWN-8BIT" [RFC 1428] or "UTF-8" may be appropriate.
>       Note that for the MHTML processing of (matching URLs in body
>       text to URL in) Content-Location headers the choice of
>       character encoding need not be the "correct" choice. It need
>       only be a choice which, after reversal of the encoding by the
>       receiving mailer, returns the same octet string as before the
>       encoding.

"Need not be the correct choice" is very dangerous. It's true for
MHTML, but it may very well not be true for other processing.
Labeling things wrongly is always a bad idea.

I therefore propose to change this paragraph into the following:

      With method (b), the charset parameter value "US-ASCII"
      SHOULD be used if the URL contains no octets outside of
      the 7-bit range. If such octets are present, the correct
      charset parameter value (derived e.g. from information
      about the HTML document the URL was found in) SHOULD be used.
      If this cannot be safely established, the value "UKNOWN-8BIT"
      [RFC 1428] MUST be used.

      Note that for the MHTML processing of (matching URLs in body
      text to URL in) Content-Location headers the value of the
      charset parameter is irrelevant, but it may be relevant
      for other purposes, and incorrect labeling MUST therefore
      be avoided.


The full text then looks as follows:

--- --- --- new proposed text --- --- ---

   Handling of URLs containing inappropriate characters

   Some documents may contain URLs with characters that are
   inappropriate for an RFC 822 header, either because the URL
   itself has an incorrect syntax or the URL syntax standard has
   been changed to allow characters not previously allowed in
   MIME headers. These URLs cannot be sent directly in a mail
   header. There are two approaches that can be taken when
   encountering such a URL as the text to be placed in a Content-
   Location or Content-Base header:

   a) In some situations, an implementation might be able to
      replace the URL with one that can be sent directly. This might
      be accomplished, for example, by using the encoding method of
      RFC 1738 to replace inappropriate characters within the URL
      with ones encoded using the %nn encoding. This replacement
      MUST in that case be done both in the header and in the HTML
      text which has a hyperlink which is to match the header. Since
      the change is done in both places, a receiving mailer need not
      decode it, and MUST NOT decode RFC 1748-encoding before
      matching hyperlinks to body parts.

   b) The URL might be encoded using the method described in RFC
      2047. This replacement MUST only be done in the header, not in
      the HTML text.  Receiving clients must decode the RFC 2047
      encoding before comparing hyperlinks in body text to URLs in
      Content-Location headers.

      With method (b), the charset parameter value "US-ASCII"
      SHOULD be used if the URL contains no octets outside of
      the 7-bit range. If such octets are present, the correct
      charset parameter value (derived e.g. from information
      about the HTML document the URL was found in) SHOULD be used.
      If this cannot be safely established, the value "UKNOWN-8BIT"
      [RFC 1428] MUST be used.

      Note that for the MHTML processing of (matching URLs in body
      text to URL in) Content-Location headers the value of the
      charset parameter is irrelevant, but it may be relevant
      for other purposes, and incorrect labeling MUST therefore
      be avoided.

   Caution should, however, be taken in using method (a), since,
   in general, this encoding can not be applied safely to
   characters that are used for reserved purposes within the URL
   scheme. In addition, changing the HTML body which contains the
   URL might invalidate a message integrity check. Because of
   these problems, this method SHOULD only be used if it is
   performed in cooperation with the author/owner of the
   document.

--- --- --- end new proposed text --- --- ---

Regards,        Martin.

Back to: Top of Message | Previous Page | Main MHTML Page

Permalink



LISTSRV.NORDU.NET

CataList Email List Search Powered by the LISTSERV Email List Manager