LISTSERV mailing list manager LISTSERV 15.5

Help for MHTML Archives


MHTML Archives

MHTML Archives


View:

Next Message | Previous Message
Next in Topic | Previous in Topic
Next by Same Author | Previous by Same Author
Chronologically | Most Recent First
Proportional Font | Monospaced Font

Options:

Join or Leave MHTML
Reply | Post New Message
Search Archives


Subject: Re: More on wrongly(?) formatted urls
From: Nick Shelness <[log in to unmask]>
Reply-To:IETF working group on HTML in e-mail <[log in to unmask]>
Date:Tue, 26 Aug 1997 11:25:47 +0100
Content-Type:text/plain
Parts/Attachments:
Parts/Attachments

text/plain (51 lines)


Jacob, You write (> ) in response to my previous note (>> ):-

> Below a discussion of only one sentence in your text:
>
>> A text/html root object may contain absolute or relative URLs that
cannot
>> be employed directly in MIME Content-base or Content location headers.
This
>> is because their direct employment would violate RFC 822 header syntax.
>
> RFC 822 header syntax? Do you mean some kind of general header syntax,
> which is valid for all RFC 822 headers? Mainly, in RFC 822 each header
> has its own syntax definition, so we can define them to be whatever
> we want.

I agree! I was trying to express the reason that a URL could not be
employed directly. The only specific reason that I can think of is if it
contained octets outwith the range %d32-126. Even multiple spaces are not a
problem if the sending system does not fold the header.

> The reason why certain characters are not allowed in URLs is not only
> the problem of transporting them in RFC 822 headers. RFC 1738 says
>
>    Characters can be unsafe for a number of reasons.  The space
>    character is unsafe because significant spaces may disappear and
>    insignificant spaces may be introduced when URLs are transcribed or
>    typeset or subjected to the treatment of word-processing programs.
>    The characters "<" and ">" are unsafe because they are used as the
>    delimiters around URLs in free text; the quote mark (""") is used to
>    delimit URLs in some systems.  The character "#" is unsafe and should
>    always be encoded because it is used in World Wide Web and in other
>    systems to delimit a URL from a fragment/anchor identifier that might
>    follow it.  The character "%" is unsafe because it is used for
>    encodings of other characters.  Other characters are unsafe because
>    gateways and other transport agents are known to sometimes modify
>    such characters. These characters are "{", "}", "|", "\", "^", "~",
>    "[", "]", and "`".
>
>    All unsafe characters must always be encoded within a URL.
>
> Have we a different definition of "unsafe" than RFC 1738, allowing
> characters in URLs which RFC 1738 does not allow?

RFC 1738 describes how to encode URLs. Clearly, if we only encounter URLs
in text/html objects that are RFC 1738 compliant we have no problem as they
will also be RFC 822 header syntax compliant. The problem arises when we
encounter URLs that are not RFC 1738 compliant and which, therefore, may
not be RFC 822 header syntax compliant. This then is the case, and the only
case, in which 2047 encoding is required.

Nick

Back to: Top of Message | Previous Page | Main MHTML Page

Permalink



LISTSRV.NORDU.NET

CataList Email List Search Powered by the LISTSERV Email List Manager