Two students at my university are writing a Master's thesis by developing an MHTML-compliant software. They have wondered about the following text in the MHTML standard (RFC 2557)
" The charset parameter value "US-ASCII" SHOULD be used if the URI contains no octets outside of the 7-bit range. If such octets are present, the correct charset parameter value (derived e.g. from information about the HTML document the URI was found in) SHOULD be used. If this cannot be safely established, the value "UNKNOWN-8BIT" [RFC 1428] MUST be used. "
My understanding of this is that the above clause is valid if you have received a document via some transport mechanism which does not tell you the charset.
However, if you have received a document via ordinary HTTP download, and there is no charset indication in the HTTP header, then the default charset is "ISO-8859-1" and not "US-ASCII" or "UNKNOWN-8BIT". So the rule quoted above does not apply to documents downloaded via HTTP before being mailed.
Of course, you might check if all the characters are 7-bit, and then e-mail it as "US-ASCII" instead of "ISO-8859-1", but this should not be required.
Is this right? If not, say so!
-- Jacob Palme <[log in to unmask]> (Stockholm University and KTH) for more info see URL: http://www.dsv.su.se/jpalme/
|