LISTSERV mailing list manager LISTSERV 15.5

Help for MHTML Archives


MHTML Archives

MHTML Archives


View:

Next Message | Previous Message
Next in Topic | Previous in Topic
Next by Same Author | Previous by Same Author
Chronologically | Most Recent First
Proportional Font | Monospaced Font

Options:

Join or Leave MHTML
Reply | Post New Message
Search Archives


Subject: Re: Summary of decisions at the Montreal MHTML IETF meeting
From: Einar Stefferud <[log in to unmask]>
Reply-To:[log in to unmask]
Date:Thu, 11 Jul 1996 10:20:41 -0700
Content-Type:text/plain
Parts/Attachments:
Parts/Attachments

text/plain (193 lines)


Hello Martin, et al -- Below is the message you sent on 3 July
recommending some revised text.  I do not consider this a proper
replacement of Jacob's original text because it only tries to revise
what Jacob said, and what we need is a complete and very much
simplified replacement.  As it turns out, your text still suffers from
trying to discuss many unnecessary aspects of HTML and MIME character
encoding issues, jusdt because Jacob discussed them.

What we need is to replace all this confusing text with a simple
statement about how HTML specifications determine what can or cannot
be placed in an HTML document, and that MIME specifications determine
how any HTML object must be encoded for transit via Internet EMail in
a Multipart/related MIME envelope.  Within MHTML, the Text/HTML MIME
Content-type is referenced for specification of how HTML must be
enclosed in MIME envelopes, and that is where any serious discussion
of character set encodings and transfer encodings should be placed.
Our MHTML Porposed Standard is the wrong plcae to lodge these
specifications.

If there are any special issues with carriage of HTML in MIME, then
those issues should be addressed with specifications for how to handle
any special cases when placing HTML objects in MIME envelopes.  I do
not know of any such special cases, issues or problems.

As I understand things, HTML, as a markup language, regards all CR,
LF, DRLF, or LFCR strings to be soft <newline> directives which HTML
will ignore when interpreting the markup language.  All HTML newline
indicators are explicit in the language, so HTML interpretors must be
able to cope with CR, LF, CRLF, or LFCR by ignoring them in processing
the HTML text stream.

The same kind of thing must hold true for all the ther HTML character
set issues.  HTML specifies a way to quote special characters into the
HTML text stream, and these are totally independent of any MIME
encoding, and must be understood by any legitimate HTML interpretor.

Therefore, all these discussion in Jacob's draft needs to be removed
and replced by something that clearly describes what MHTML
implementations MUST, SHOULD, or MAY do.

I am looking for a specific replacement text, not suggestions about
how it might be changed.  We need proposals for new text.

Cheers...\Stef

From your message Wed, 3 Jul 1996 17:16:38 +0200 (MET DST):
}
}Hello everybody - I have had a look at the section 11.1, Character
}set issues. Here are my proposals for improvement {comments in
}brackets}. I took into consideration the recent comments of Albert Lunde
}(about the term "document character set", right on target) and to some
}extent the comment of Harald Alvestrand (see below for some comment).
}
}>--- cut here ---
}>11. Encoding Considerations for HTML bodies
}>
}>11.1 Character set issues
}>
}>A mail user agent that wishes to send a content-type of HTML can just do
}>so, so long as the normal data encoding issues are taken care of as
}>specified in RFC 1521 [MIME1].
}{leave as is}
}
}>However at a basic level there are some
}>differences between HTML being transferred by HTTP and HTML being
}>transferred through Internet email. When transferred through HTTP, HTML by
}>default uses the document character set ISO-8859-1 [HTML2]. Within
}>electronic mail, the default character set is US-ASCII [MIME1].
}
}However, there are some differences as to the default character encoding,
}specified by the MIME "charset" parameter, if this parameter is omitted.
}When transferred through HTTP, the default is [HTML2]:
}        content-type: text/html; charset=iso-8859-1
}when transferred with electronic mail, the default is [MIME1]:
}        content-type: text/html; charset=us-ascii
}
}>The sending of HTML messages via MIME e-mail can be seen as two layers of
}>encoding:
}
}When sending HTML via MIME email, three layers of encoding are relevant:
}
}>Displayed text            Displayed text
}>(e.g. with a              (e.g. with a HTML viewer
}>HTML editor)              or Web browser)
}>     |                         |
}>HTML markup               HTML markup
}>     |                         |
}>MIME encoding--transport--MIME encoding
}>
}>                Figure 1
}
}Displayed text                Displayed text
}(e.g. with a                  (e.g. with a HTML viewer
}HTML editor)                  or Web browser)
}     |                             |
}HTML markup                   HTML markup
}     |                             |
}character encoding          character encoding (denoted by MIME "charset" para
} meter)
}     |                             |
}content-transfer-encoding  content-transfer-encoding (quoted-printable or base
} 64)
}     |---------------transport-----|
}
}                Figure 1
}
}
}
}
}>If the displayed text contains non-ascii characters, these characters
}>might have to be rewritten if the transport (as is common in e-mail) is
}>set to handle only 7-bit characters.
}
}If the text in question contains non-ascii characters, encoding on at
}least one level is necessary.
}
}>This rewriting can be done either at the HTML layer (using "&" entity
}>references or numeric character references as defined in [HTML2] section
}>3.2.1) or at the MIME layer (using Content-Transfer-Encoding as defined in
}>[MIME1] section 5).
}
}Encoding can be done at one or more of the following layers, in the
}following sequence:
}- At the HTML layer, using the SGML techniques of character entities
}        (of the form &mnemonic;) or numeric character
}        references as defined in [HTML2] section 3.2.1.
}- Using an appropriate character encoding and declaring it by using
}        the MIME "charset" parameter with an appropriate value.
}        (Some character encodings, for example us-ascii and iso-2022-jp,
}        are by themselves 7-bit, while others are 8-bit.)
}- Using an appropriate Content-Transfer-Encoding mechanism in the case
}        an 8-bit character encoding is choosen and the data has to be
}        transferred over a 7-bit connection (very frequent in the
}        case of email). The Content-Transfer-Encoding mechanisms
}        "quoted-printable" and "base64" can be used to reduce
}        8-bit data to 7-bit.
}
}
}>In sending a message containing non-ascii characters, both these rewriting
}>methods for non-ascii characters MAY be used, and any mixture of them MAY
}>occur when sending the document via e-mail. Receiving mailers MUST be
}>capable of both decoding at the MIME layer and mapping at the HTML layer.
}>MIME decoding MUST take place before mapping at the HTML layer.
}
}In sending a message containing non-ascii characters, all these encoding
}methods for non-ascii characters MAY be used, and any mixture of them MAY
}occur when sending the document via e-mail. Receiving mailers MUST be
}capable of handling any combination of these encodings.
}
}
}
}>The charset attribute of the Content-Type attribute should be us-ascii if
}>and only if the html markup contains only us-ascii characters (even if the
}>displayed text contains non-ascii characters).
}
}The charset attribute of the Content-Type attribute should be us-ascii if
}and only if the html markup contains only us-ascii characters (even if the
}displayed text contains non-ascii characters).
}
}The value of the charset parameter of the Content-Type header field
}should be us-ascii if and only if the HTML markup contains only us-ascii
}characters (even if the displayed text contains non-ascii characters).
}
}-----------------------------------------------
}
}I hope that we can work on from the changes I have made above.
}
}
}Harald Alvestrand proposed the following modification:
}
}>[log in to unmask] said:
}>> This rewriting can be done either at the HTML layer (using "&" entity
}>> references or numeric character references as defined in [HTML2]
}>> section 3.2.1) or at the MIME layer (using Content-Transfer-Encoding
}>> as defined in [MIME1] section 5).
}>
}>Suggested alternate text for the paragraph:
}>
}> The entity generating the HTML MAY choose to send non-ascii characters
}> as themselves, in which case the document will use a MIME
}> content-transfer-encoding, or it MAY choose to represent non-ascii
}> characters using entity references or numeric character references
}> as defined in [HTML2], in which case the document can be sent in the
}> default "7bit" content transfer encoding.
}>
}>This avoids giving the impression that the "MIME layer" and the
}>"HTML layer" are the "same kind of thing", I think.
}
}The main thing is that we have three layers. Sending "characters as
}themselves" is very unclear.
}
}
}Regards,        Martin.

Back to: Top of Message | Previous Page | Main MHTML Page

Permalink



LISTSRV.NORDU.NET

CataList Email List Search Powered by the LISTSERV Email List Manager