LISTSERV mailing list manager LISTSERV 15.5

Help for MHTML Archives

MHTML Archives

MHTML Archives


Next Message | Previous Message
Next in Topic | Previous in Topic
Next by Same Author | Previous by Same Author
Chronologically | Most Recent First
Proportional Font | Monospaced Font


Join or Leave MHTML
Reply | Post New Message
Search Archives


Re: More on wrongly(?) formatted urls


Jacob Palme <[log in to unmask]>


IETF working group on HTML in e-mail <[log in to unmask]>


Tue, 19 Aug 1997 19:44:30 +0200





text/plain (1 lines)

I wrote:

> (a) Use the encoding scheme described in RFC 1738 [URL].
> If this method is used, the corresponding URL in the HTML
> text must also be changed with the same encoding. This has
> the disadvantage that an URL which could be used for direct
> network retrieval will not work any more, that the HTML
> text may not any more agree with the corresponding document
> on the net, and that electronic seals may not work any more.
> Warning: RFC 1738 encoding may change the meaning of an
> URL. For example: "one/two%2ethree" is not the same
> URL as "one%2etwo%2ethree".
> (b) If the URLs is illegal, inform the user and ask the
> user to correct it (in both the HTML text and the URL of
> the object it refers to).
> (c) Use the encoding method for message headers described
> in RFC 2047. As long as there are no 8-bit octets, the
> charset value "US-ASCII" MUST be used. For URLs containing
> 8-bit octets, the original character encoding (charset)
> SHOULD be used if it is known without doubt. Otherwise,
> the charset value "UNKNOWN-8BIT" (RFC 1428, MIBenum 2079)
> MUST be used.

At 12.14 +0200 97-08-19, Martin J. Dürst replied:
> Why do we need three methods? I have not heard from anyone
> at the meeting that all are needed. I guess we can keep
> implementations simpler if we define ony one method.

First note that all this discussion is only on how to handle
illegal URLs. URLs which have the permitted URL syntax according
to RFC 1738 will never need any further encoding.

Method (c) means that you allow illegal URLs, like URLs containing
the space character which is not allowed according to RFC 1738.
My feeling is that we should not recommend as the only method
to use, a method which means you send illegal URLs, when there
are two methods, (a) and (b), which means you send correctly
formatted URLs.

If only one of method (a) and (b) is to be recommended, I
would prefer method (b), since a user who has produced faulty
URLs may prefer to get warned about this. However, method
(a) has the advantage, in those cases where it does not
corrupt the URL, that it is automatic (no trouble for the
user) and produced legal URLs.

Method (a) and (b) are actually on a different layer than
metod (c). Method (a) and (b) are methods of ensuring that
the URLs used are correct according to RFC 1738, while method
(c) is a way of coping with URL which are not correct and
cannot be made correct for some reason.

Here is a new draft text, which more clearly shows that this
is a matter of several layers:

     URLs in Content-Location and Content-Base headers SHOULD
     have the permitted syntax for URLs according to RFC 1738.
     In particular, this means that many characters, for example
     SPACE, SHOULD be encoded using the % method specified in
     RFC 1738.

     In certain cases, a mailer may be provided with HTML text
     containing wrongly formatted URLs, for example containing
     unencoded SPACE characters, in hyperlinks to other body
     parts. In such cases, the mailer has several options.
     One option is to correct the URL in both the Content-Location
     and the HTML text. Such a correction can sometimes be done
     automatically, but may in some cases require communication
     with the user. In some cases this option is not suitable,
     because it requires rewriting of the original HTML text
     and it may cause the Content-Location value to be different
     from the location from where an object is actually available
     on the Internet. In such cases, an alternative option is
     to keep the HTML text as it is, and encode the Content-
     Location header using the encoding methods from RFC 2047.
     RFC 2047 encoding must therefore be reversed by recipients
     before comparing URLs in Content-Location headers with
     URLs in HTML text bodies. RFC 1738 encoding, however, should
     not be reversed before this comparison.

The text I have written above is certainly not easier to
understand, but it is more correct and more clearly pinpoints
the difference between the two encoding methods.

Jacob Palme <[log in to unmask]> (Stockholm University and KTH)
for more info see URL:

Back to: Top of Message | Previous Page | Main MHTML Page



CataList Email List Search Powered by the LISTSERV Email List Manager