LISTSERV mailing list manager LISTSERV 15.5

Help for MHTML Archives

MHTML Archives

MHTML Archives


Next Message | Previous Message
Next in Topic | Previous in Topic
Next by Same Author | Previous by Same Author
Chronologically | Most Recent First
Proportional Font | Monospaced Font


Join or Leave MHTML
Reply | Post New Message
Search Archives

Subject: Re: More on wrongly(?) formatted urls
From: Martin J. Dürst <[log in to unmask]>
Reply-To:IETF working group on HTML in e-mail <[log in to unmask]>
Date:Thu, 21 Aug 1997 14:02:08 +0200

TEXT/PLAIN (75 lines)

On Wed, 20 Aug 1997, Jacob Palme wrote:

> Here is a new draft text. It is based on the text proposed
> by Larry M and also tries to take into account the comments
> by Martin J D.

Many thanks for your proposal.

> Some changes in other places of RFC 2110 may also be
> necessary, I will check this when I update the draft.
>      Handling of URLs containing inappropriate characters
>      Some URLs may contain characters that are inappropriate
>      for an RFC 822 header, either because the URL itself
>      has an incorrect syntax or the URL syntax has changed to
>      allow characters not allowed in mail headers. To include
>      such a URL in a mail header, an implementation can either
>      (a) arrange so that the URL becomes correctly formatted or
>      (b) encode the header using the encoding method described
>      in RFC 2047.
>      Method (a) MUST be applied to the URL both in Content-
>      Location headers and in body text. It MUST NOT be reversed
>      by receiving mailers before matching hyperlinks to body
>      parts.
>      Method (b) can be applied only to the URL in Content-
>      Location headers and MUST be reversed by receiving clients
>      before comparing hyperlinks in body text to URLs in
>      Content-Location headers.

The can here seems somewhat ambiguous. Does it mean: It can be
applied only to the URL in Content-Location headers, but it
can also be applied to both, or does it mean: It can be applied
only to headers, but not to URLs inside a document?
I guess it would be better to say "MUST not be applied to
URLs inside the HTML text".
Also, I guess it should not only be Content-Location, but also

>      Method (a) is not always easy. It may include cooperation
>      with the user and the software which produced the faulty
>      URL. The encoding method of RFC 1738 can make a correct
>      URL faulty if not done the right way. Changing the URL of
>      documents already available on the Internet or an Intranet
>      may invalidate existing links to this document. Changing
>      the HTML body may invalidate message integrity checks.

Do we really want to propose a method that has that many
ifs and whens? Do we assume implementors will all get all
these things right? I seriously doubt it.

>      If method (b) is used, the charset US-ASCII can be used,
>      or, if the URL contains octets outside of the 7-bit range,
>      "UKNOWN-8BIT" [RFC 1428] or "UTF8" may be appropriate.

As I have written, "UTF8" should be "UTF-8", and is in fact
inappropriate here.

>      Note that for MHTML processing (matching of URLs in body
>      text to URL in Content-Location headers) the choice of
>      character need not be the "correct" set, it need only
>      be a set which, after reversal of the encoding by the
>      receiving mailer, returns the same octet string as before
>      the encoding.

Please don't speak about "character set" when you mean character

Regards,        Martin.

Back to: Top of Message | Previous Page | Main MHTML Page



CataList Email List Search Powered by the LISTSERV Email List Manager