LISTSERV mailing list manager LISTSERV 15.5

Help for MHTML Archives


MHTML Archives

MHTML Archives


View:

Next Message | Previous Message
Next in Topic | Previous in Topic
Next by Same Author | Previous by Same Author
Chronologically | Most Recent First
Proportional Font | Monospaced Font

Options:

Join or Leave MHTML
Reply | Post New Message
Search Archives


Subject: Re: More on wrongly(?) formatted urls
From: Jacob Palme <[log in to unmask]>
Reply-To:IETF working group on HTML in e-mail <[log in to unmask]>
Date:Tue, 19 Aug 1997 08:46:54 +0200
Content-Type:text/plain
Parts/Attachments:
Parts/Attachments

text/plain (65 lines)


At 20.53 +0200 97-08-18, Martin J. Dürst wrote:
> I therefore propose the following text for the new version of
> the MHTML spec (adapted from Ed above):
>
>    URLs that contain characters or octets in-approprirate for an
>    822 header, such as SPACE, CTLs, double quotes, backslashes,
>    and so on, and 8-bit octets MUST be encoded using the method
>    for message headers described in RFC 2047. As long as there
>    are no 8-bit octets, the charset value "US-ASCII" MUST be used.
>    For URLs containing 8-bit octets, the original character encoding
>    (charset) SHOULD be used if it is known without doubt. Otherwise,
>    the charset value "UNKNOWN-8BIT" (RFC 1428, MIBenum 2079) MUST
>    be used.
>
> NOTE: For MHTML processing (URL matching), the charset value is
>         irrelevant, but it may be relevant for other operations
>         on the URL.

Perhaps the text should be as follows:

     Handling of URLs containing in-appropriate characters

     If an URL, which is to be put into a Content-Location or
     Content-Base header, contains in-appropriate characters for
     an 822 header, such as SPACE, CTLs, double quotes, backslashes,
     and so on, one of the following methods is used. (Note: Such
     URLs may be illegal according to the rules for URLs, but a
     mailer may still be given such URLs in text to be sent by
     their user).

     (a) Use the encoding scheme described in RFC 1738 [URL].
     If this method is used, the corresponding URL in the HTML
     text must also be changed with the same encoding. This has
     the disadvantage that an URL which could be used for direct
     network retrieval will not work any more, that the HTML
     text may not any more agree with the corresponding document
     on the net, and that electronic seals may not work any more.
     Warning: RFC 1738 encoding may change the meaning of an
     URL. For example: "one/two%2ethree" is not the same
     URL as "one%2etwo%2ethree".

     (b) If the URLs is illegal, inform the user and ask the
     user to correct it (in both the HTML text and the URL of
     the object it refers to).

     (c) Use the encoding method for message headers described
     in RFC 2047. As long as there are no 8-bit octets, the
     charset value "US-ASCII" MUST be used. For URLs containing
     8-bit octets, the original character encoding (charset)
     SHOULD be used if it is known without doubt. Otherwise,
     the charset value "UNKNOWN-8BIT" (RFC 1428, MIBenum 2079)
     MUST be used.

     NOTE: For MHTML processing (URL matching), the charset value is
     irrelevant, but it may be relevant for other operations
     on the URL.

     When receiving a message, the encoding of type (c) is
     reversed before compariing URLs in headings with URLs
     in hyperlinks in HTML text, but encoding if type (a)
     is not reversed.

------------------------------------------------------------------------
Jacob Palme <[log in to unmask]> (Stockholm University and KTH)
for more info see URL: http://www.dsv.su.se/~jpalme

Back to: Top of Message | Previous Page | Main MHTML Page

Permalink



LISTSRV.NORDU.NET

CataList Email List Search Powered by the LISTSERV Email List Manager