Report of revision of draft-ietf-mhtml-spec-01 into
The revised text is available at URL
and also included as an attachement to this message.
A revised version of the informational accompanying document is
available at URL:
As usual, all the relevant documents can be found from the home page
for our work at URL:
Making the document less HTML dependent
I have implemented most of the changes proposed by Steve Zilles, and whose
aim was that the standard should be applicable also for other link-
containing formats than HTML, such as PDF or VRML. I have however in a few
cases used a little less strong wording than Zilles proposed.
The abstract of draft-ietf-mhtml-spec-02 contains the following phrase
which was not part of Zilles proposal: "Only HTML objects with such links
were fully considered in developing this standard, but the standard may
still be applicable also to other link-containing object types than HTML".
And the introduction contains the following phrase which was not part of
This version of this standard was based on full consideration only of the
needs for objects with links in the Text/HTML media type (as defined in
RFC 1866 [HTML2]), but the standard may still be applicable also to other
formats for sets of interlinked objects, liked by URIs. There is no
conformance requirement that implementations claiming conformance to this
standard are able to handle URI-s in other document formats than HTML.
Zille wanted to remove the following paragraph
The Text/HTML body MAY contain links to MIME body parts outside of the
Multipart/Related or in other messages, but such usage is discouraged.
Implementors are warned that many receiving mailers may not be able to
resolve such links.
Zilles argument for removing this paragraph was:
> I do not believe this paragraph is well defined. First there may be
> multiple body parts of Content-Type: Text/HTML (because the referred to
> body parts may be of that Content-Type). Secondly, section 8.1 only seems
> to define how reference will work within a Multipart/Related body part so
> it would seem that any other kind of reference is undefined.]
I do not fully agree with Zille on this, but I have reworded the text of the
paragraph in question to read as follows:
This standard does not cover the case where a multipart/related contains
links to MIME body parts outside of the current multipart/related or in
other MIME messages, even if methods similar to those described in this
standard are used. Implementors who provide such links are warned that
mailers implementing this standard may not be able to resolve such links.
I have removed the figure from section 11, ordered so by the working group
chairman, Einar Stefferud. I still personally believe this figure would
make the text more understandable.
The value of the charset parameter
Some people claim that we need not say anything about the value of the
charset parameter, since this is already specified in MIME. I do not agree
with this. MIME is rather ambiguous. Here is a quote from MIME (RFC 1521):
The specification for any future subtypes of "text" must specify
whether or not they will also utilize a "charset" parameter, and may
possibly restrict its values as well. When used with a particular
body, the semantics of the "charset" parameter should be identical to
those specified here for "text/plain", i.e., the body consists
entirely of characters in the given charset. In particular, definers
of future text subtypes should pay close attention the the
implications of multibyte character sets for their subtype
This RFC specifies the definition of the charset parameter for the
purposes of MIME to be a unique mapping of a byte stream to glyphs, a
mapping which does not require external profiling information.
To me, this text is not clear.
(a) The text says 'The specification for any future subtypes of "text"
must specify whether or not they will also utilize a "charset" parameter,
and may possibly restrict its values as well'. This tends to indicate to
me that the interpretation of the charset parameter is content-type
dependent, and that thus we must specify how this is done for our content-
(b) The MIME text says "the body consists entirely of characters in the
given charset". It is not clear to me whether "character" in this text
refers to "octet in the HTML markup" or "character as displayed to the
Thus, we must make this clear, by choosing one of the two alternatives:
Alternative 1: The charset is the charset of the HTML markup, rather than
the charset of the displayed text. Thus, for example, the string "ä"
has the charset US-ASCII and not the charset ISO 8859-1.
Alternative 2: The charset is the charset of the displayed text. In that
case, "ä" to my mind is neither US-ASCII nor ISO 8859-1 but rather a
third charset, since ISO 8859-1 specifies that the glyph denoted by ä
be denoted by a single octet, not by a series of octets.
All of this would be much clearer with the figure which I have been forced
to remove from the text, since that figure clearly shows the difference
between "HTML markup" and "displayed text".
To "solve" the problem by not writing anything about this, is a bad
solution, since then some implementors may implement Alternative 1
and some Alternative 2, and this might cause interoperability problems.
The handling of line breaks
This is obviously a very controversial issue. It is always tempting in
such cases to leave out all text about the controversial issue. This is
however NOT a good way of resolving controversial issues in standards
development, since this will mean that different implementors will make
different assumptions and their systems may then not be able to
We all agree that all line breaks in the content-transfer-encoded text
must be CRLF. The issue of contention is if the HTML text before content-
transfer-encoding might contains bare LFs or bare CRs. Arguments for this
(a) it is very common in HTTP and it may be difficult to get implementors
to deviate from HTTP conventions on this
(b) keeping the original HTML text intact allows integrity checks with
Arguments against this is:
MIME allows other line breaks than CRLF, but only in binary data, for
textual data, MIME requires line breaks to be CRLF in text. And the text
of RFC 1521 seems to indicate that this is valid both before and after
The definitions section
The following new items have been added to chapter 2.2 Other terminology:
Displayed text The text shown to the user reading a document with
a web browser. This may be different from the HTML
markup, see the definition of HTML markup below.
HTML markup A file containing HTML encodings as specified in
[HTML] which may be different from the displayed
text which a person using a web browser sees. For
example, the HTML markup may contain "<" where
the displayed text contains the character "<".
PDF Portable Document Format, see [PDF].
VRML Virtual Reality Markup Language.
Jacob Palme <[log in to unmask]> (Stockholm University and KTH)
for more info see URL: http://www.dsv.su.se/~jpalme