Two students at my university are writing a Master's thesis
by developing an MHTML-compliant software. They have
wondered about the following text in the MHTML standard
" The charset parameter value "US-ASCII" SHOULD be used if the URI
contains no octets outside of the 7-bit range. If such octets are
present, the correct charset parameter value (derived e.g. from
information about the HTML document the URI was found in) SHOULD be
used. If this cannot be safely established, the value "UNKNOWN-8BIT"
[RFC 1428] MUST be used. "
My understanding of this is that the above clause is valid
if you have received a document via some transport
mechanism which does not tell you the charset.
However, if you have received a document via ordinary HTTP
download, and there is no charset indication in the HTTP
header, then the default charset is "ISO-8859-1" and not
"US-ASCII" or "UNKNOWN-8BIT". So the rule quoted above does
not apply to documents downloaded via HTTP before being
Of course, you might check if all the characters are 7-bit,
and then e-mail it as "US-ASCII" instead of "ISO-8859-1",
but this should not be required.
Is this right? If not, say so!
Jacob Palme <[log in to unmask]> (Stockholm University and KTH)
for more info see URL: http://www.dsv.su.se/jpalme/