LISTSERV mailing list manager LISTSERV 15.5

Help for MHTML Archives

MHTML Archives

MHTML Archives


Next Message | Previous Message
Next in Topic | Previous in Topic
Next by Same Author | Previous by Same Author
Chronologically | Most Recent First
Proportional Font | Monospaced Font


Join or Leave MHTML
Reply | Post New Message
Search Archives

Subject: Re: More on wrongly(?) formatted urls
From: Jacob Palme <[log in to unmask]>
Reply-To:IETF working group on HTML in e-mail <[log in to unmask]>
Date:Sat, 23 Aug 1997 11:23:42 +0200

text/plain (63 lines)

At 09.13 -0700 97-08-22, Larry Masinter wrote:
> 1738 is being updated. In the real world, people are using all kinds of
> characters as "reserved", and the truth is you're really taking a risk
> if you encode something on your own that wasn't encoded.
> In the interest of safety, I think you're better off *not* recommending
> using %xx encoding as a way of making illegal URLs safer. But this
> is still just implementation advice.
> > I assume this means that any other character, if occuring in the value
> > submitted to a mailer for a Content-Location, must be encoded either
> > using the RFC 1738 encoding method or the RFC 2047 encoding method.
> It is misleading to talk about "encoding a character using the
> RFC 1738 encoding method", because the RFC 1738 encoding method
> is not a character-by-character encoding. That is, you have to
> look at the whole URL and the scheme and the context of the
> character. RFC 2047 encoding, on the other hand, can be decided
> character-by-character, because it is at a different layer.

Here is a new draft text, based on your suggestions; exclamation
marks in the border marks changes to the previous draft text:

     Handling of URLs containing inappropriate characters

     Some URLs may contain characters that are inappropriate for an
     RFC 822 header, either because the URL itself has an incorrect
     syntax or the URL syntax has changed to allow characters not
     allowed in mail headers. To include such a URL in a mail
     header, an implementation can either (a) arrange so that the
     URL becomes correctly formatted or (b) encode the header using
     the encoding method described in RFC 2047.

     Method (a) MUST be applied to the URL both in Content-
     Location headers and in body text. It MUST NOT be reversed by
     receiving mailers before matching hyperlinks to body parts.

     Method (b) MUST not be applied to the URL in the HTML text and
     MUST be reversed by receiving clients before comparing
     hyperlinks in body text to URLs in Content-Location headers.

     Method (a) is not always easy. It may include cooperation with
     the user and the software which produced the faulty URL. The
     encoding method of RFC 1738 can make a correct URL faulty if
     not done the right way. Changing the URL of documents already
     available on the Internet or an Intranet may invalidate
     existing links to this document. Changing the HTML body may
!    invalidate message integrity checks. For these reasons, this
!    standards recommends method (b).

!    With method (b), the charset US-ASCII can be used, or, if the
     URL contains octets outside of the 7-bit range, "UKNOWN-8BIT"
     [RFC 1428] or "UTF-8" may be appropriate. Note that for MHTML
     processing (matching of URLs in body text to URL in Content-
     Location headers) the choice of character encoding need not be
     the "correct" choice, it need only be a choice which, after
     reversal of the encoding by the receiving mailer, returns the
     same octet string as before the encoding.

Jacob Palme <[log in to unmask]> (Stockholm University and KTH)
for more info see URL:

Back to: Top of Message | Previous Page | Main MHTML Page



CataList Email List Search Powered by the LISTSERV Email List Manager