At 12.55 +0200 97-08-20, Martin J. Dürst wrote:
> > First note that all this discussion is only on how to handle
> > illegal URLs. URLs which have the permitted URL syntax according
> > to RFC 1738 will never need any further encoding.
> Yes and no. URL syntax may change in the future.
Perhaps. But it would surprise me if the URL syntax changes in ways
to allow unusual encoded characters. The present syntax very carefully
is designed to only allow a common subset of ASCII, so as to ensure
that URLs can be rendered anywhere in the world and can easily be
copied by hand. If the URL syntax is extended, for example to allow
national characters, these will probably be encoded in the URL string,
so that no further encoding is needed.
It is not common practice in IETF to make standards for what might
be needed, perhaps, some time in the future. That is the OSI method
of standards development, which has proven to be unsuccessful. The
IETF view is to standardise only what we need and understand just
> > Method (c) means that you allow illegal URLs, like URLs containing
> > the space character which is not allowed according to RFC 1738.
> > My feeling is that we should not recommend as the only method
> > to use, a method which means you send illegal URLs, when there
> > are two methods, (a) and (b), which means you send correctly
> > formatted URLs.
> They are correctly formatted. But maybe their semantics got
> wrong. See below for why that could happen.
No, RFC 1738 clearly specifies that for example SPACE characters
are not allowed in URLs. An URL which contains a SPACE, which is
not encoded as %20, is thus not in agreement with RFC 1738.
> Method (b) might be what the user perceives
> in a tightly integrated package of an HTML editor and a mail
> UA, but we shouldn't prescribe it because it only covers part
> of our usage scenarios and for the protocol we are concerned
> with is an user interface issue.
It was not my intention to prescribe. My intention was to list
three different methods, and allow implementors to choose any
or all of them.
> Now back to method (a). It produces syntactically legal
> URLs. But syntactic legality is only half of the job.
> The URL can get corrupted in that process, and that's why
> method (a) should go away. There are two cases that can
> 1) Reserved characters: The distinction between %2F and /
> is in many cases crucial.
"/" is not a forbidden character in an URL. It is allowed,
properly used, and the mailer thus need not encode it further.
I cannot see any reason for a mailer which encounters a "/"
in an URL to assume that the "/" is there otherwise than
as defined in the URL syntax.
> 2) 8-bit octets: The assumed practice of taking the 8-bit
> octets as they appear in the HTML text and convert
> them to %HH may work in some limited cases, but is
> actually severely broken. If you store a Cyrillic
> page in KOI-8 and later convert it to iso-8859-5,
> the above assumed practice will produce two different
> URLs for the same resoure.
Your argument is an argument for changing the encoding
method specified in RFC 1738. It cannot be the task of mailers
to correct deficiencies in the encoding method of RFC 1738,
those should be corrected by modifying RFC 1738. Perhaps you
can submit a proposal on this to IETF?
Jacob Palme <[log in to unmask]> (Stockholm University and KTH)
for more info see URL: http://www.dsv.su.se/~jpalme