Answering to mail from Einar Stefferud and Jacob Palme:
- I am honored that Einar has proposed me to draft the part of the
mhtml spec about "charset" and stuff. However, I would rather
prefer to give Jacob another chance. There are still some
misunderstandings that I think he can easily remove (see below).
So because of my current workload with other things, I would
prefer not to have to step in. If it really turns out to be unavoidable,
I'll do it, but let's make another try.
- I am in no way qualified to do anything about the line-ending part.
That would have to stay as it is or be taken over by somebody else.
>I have removed the figure from section 11, ordered so by the working group
>chairman, Einar Stefferud. I still personally believe this figure would
>make the text more understandable.
>The value of the charset parameter
>Some people claim that we need not say anything about the value of the
>charset parameter, since this is already specified in MIME. I do not agree
>with this. MIME is rather ambiguous. Here is a quote from MIME (RFC 1521):
The quote is indeed rather ambiguous, and in some cases (e.g. using
the word "glyph") even plain wrong. But this issue has in the meantime
been worked on and clarified. RFC 1521 was/is subjected to a lot of
changes, which appeared/appear as draft-ietf-822ext-mime-????-??.txt
if they haven't made it yet into an RFC.
So the solution here is not to explain things in more detail, but to give
the right reference. I guess there should be some people in this group
more current than myself on the actual state of the above set of internet
> The specification for any future subtypes of "text" must specify
> whether or not they will also utilize a "charset" parameter, and may
> possibly restrict its values as well. When used with a particular
> body, the semantics of the "charset" parameter should be identical to
> those specified here for "text/plain", i.e., the body consists
> entirely of characters in the given charset. In particular, definers
> of future text subtypes should pay close attention the the
> implications of multibyte character sets for their subtype
> This RFC specifies the definition of the charset parameter for the
> purposes of MIME to be a unique mapping of a byte stream to glyphs, a
> mapping which does not require external profiling information.
>To me, this text is not clear.
>(a) The text says 'The specification for any future subtypes of "text"
>must specify whether or not they will also utilize a "charset" parameter,
>and may possibly restrict its values as well'. This tends to indicate to
>me that the interpretation of the charset parameter is content-type
>dependent, and that thus we must specify how this is done for our content-
There was quite a lot of discussion about how this should go in some
group on Chinese some months ago. The whole thing might have
changed in the meantime. Anyway, it makes sense to define, for each
text subtype, what "charset"s to allow. For example, html COULD in
theory say that the SGML means of including characters is enough,
and that the only "charset" accepted is iso-8859-1. For HTML, this is
theory, but for things such as PDF, it may be reality.
Anyway, the type text/html is NOT something that is specified in the
mhtml spec. The specification is done by RFC 1866, on which we can
rely. Also, the interpretation of "charset" is not supposed to be
type-dependent; only the allowable choice of "charset" parameter
values may depend on the type.
>(b) The MIME text says "the body consists entirely of characters in the
>given charset". It is not clear to me whether "character" in this text
>refers to "octet in the HTML markup" or "character as displayed to the
>Thus, we must make this clear, by choosing one of the two alternatives:
>Alternative 1: The charset is the charset of the HTML markup, rather than
>the charset of the displayed text. Thus, for example, the string "ä"
>has the charset US-ASCII and not the charset ISO 8859-1.
>Alternative 2: The charset is the charset of the displayed text. In that
>case, "ä" to my mind is neither US-ASCII nor ISO 8859-1 but rather a
>third charset, since ISO 8859-1 specifies that the glyph denoted by ä
>be denoted by a single octet, not by a series of octets.
Again, it is not the mhtml group that can choose here. First, although this
might not be specified very explicitly, it is clear to everybody working in
this field of "charset"s that Alternative 1 is the right one; Alternative 2
leads to problems as you show it. Second, specifying one or the other
alternative, if there indeed was one, is not our task. If we think that this
is not done precisely enough, we should have the appropriate documents
(http-1.1, html-i18n,...) changed. It would be hilarious if "charset" meant
e.g. Alternative 1 with HTTP, but Alternative 2 with email.
Hope this helps. Regards, Martin.