LISTSERV mailing list manager LISTSERV 15.5

Help for MHTML Archives


MHTML Archives

MHTML Archives


View:

Next Message | Previous Message
Next in Topic | Previous in Topic
Next by Same Author | Previous by Same Author
Chronologically | Most Recent First
Proportional Font | Monospaced Font

Options:

Join or Leave MHTML
Reply | Post New Message
Search Archives


Subject: Re: Summary of decisions at the Montreal MHTML IETF meeting
From: Martin J Duerst <[log in to unmask]>
Reply-To:IETF working group on HTML in e-mail <[log in to unmask]>
Date:Wed, 3 Jul 1996 17:16:38 +0200
Content-Type:text/plain
Parts/Attachments:
Parts/Attachments

text/plain (144 lines)


Hello everybody - I have had a look at the section 11.1, Character
set issues. Here are my proposals for improvement {comments in
brackets}. I took into consideration the recent comments of Albert Lunde
(about the term "document character set", right on target) and to some
extent the comment of Harald Alvestrand (see below for some comment).

>--- cut here ---
>11. Encoding Considerations for HTML bodies
>
>11.1 Character set issues
>
>A mail user agent that wishes to send a content-type of HTML can just do
>so, so long as the normal data encoding issues are taken care of as
>specified in RFC 1521 [MIME1].
{leave as is}

>However at a basic level there are some
>differences between HTML being transferred by HTTP and HTML being
>transferred through Internet email. When transferred through HTTP, HTML by
>default uses the document character set ISO-8859-1 [HTML2]. Within
>electronic mail, the default character set is US-ASCII [MIME1].

However, there are some differences as to the default character encoding,
specified by the MIME "charset" parameter, if this parameter is omitted.
When transferred through HTTP, the default is [HTML2]:
        content-type: text/html; charset=iso-8859-1
when transferred with electronic mail, the default is [MIME1]:
        content-type: text/html; charset=us-ascii

>The sending of HTML messages via MIME e-mail can be seen as two layers of
>encoding:

When sending HTML via MIME email, three layers of encoding are relevant:

>Displayed text            Displayed text
>(e.g. with a              (e.g. with a HTML viewer
>HTML editor)              or Web browser)
>     |                         |
>HTML markup               HTML markup
>     |                         |
>MIME encoding--transport--MIME encoding
>
>                Figure 1

Displayed text                Displayed text
(e.g. with a                  (e.g. with a HTML viewer
HTML editor)                  or Web browser)
     |                             |
HTML markup                   HTML markup
     |                             |
character encoding          character encoding (denoted by MIME "charset" parameter)
     |                             |
content-transfer-encoding  content-transfer-encoding (quoted-printable or base64)
     |---------------transport-----|

                Figure 1




>If the displayed text contains non-ascii characters, these characters
>might have to be rewritten if the transport (as is common in e-mail) is
>set to handle only 7-bit characters.

If the text in question contains non-ascii characters, encoding on at
least one level is necessary.

>This rewriting can be done either at the HTML layer (using "&" entity
>references or numeric character references as defined in [HTML2] section
>3.2.1) or at the MIME layer (using Content-Transfer-Encoding as defined in
>[MIME1] section 5).

Encoding can be done at one or more of the following layers, in the
following sequence:
- At the HTML layer, using the SGML techniques of character entities
        (of the form &mnemonic;) or numeric character
        references as defined in [HTML2] section 3.2.1.
- Using an appropriate character encoding and declaring it by using
        the MIME "charset" parameter with an appropriate value.
        (Some character encodings, for example us-ascii and iso-2022-jp,
        are by themselves 7-bit, while others are 8-bit.)
- Using an appropriate Content-Transfer-Encoding mechanism in the case
        an 8-bit character encoding is choosen and the data has to be
        transferred over a 7-bit connection (very frequent in the
        case of email). The Content-Transfer-Encoding mechanisms
        "quoted-printable" and "base64" can be used to reduce
        8-bit data to 7-bit.


>In sending a message containing non-ascii characters, both these rewriting
>methods for non-ascii characters MAY be used, and any mixture of them MAY
>occur when sending the document via e-mail. Receiving mailers MUST be
>capable of both decoding at the MIME layer and mapping at the HTML layer.
>MIME decoding MUST take place before mapping at the HTML layer.

In sending a message containing non-ascii characters, all these encoding
methods for non-ascii characters MAY be used, and any mixture of them MAY
occur when sending the document via e-mail. Receiving mailers MUST be
capable of handling any combination of these encodings.



>The charset attribute of the Content-Type attribute should be us-ascii if
>and only if the html markup contains only us-ascii characters (even if the
>displayed text contains non-ascii characters).

The charset attribute of the Content-Type attribute should be us-ascii if
and only if the html markup contains only us-ascii characters (even if the
displayed text contains non-ascii characters).

The value of the charset parameter of the Content-Type header field
should be us-ascii if and only if the HTML markup contains only us-ascii
characters (even if the displayed text contains non-ascii characters).

-----------------------------------------------

I hope that we can work on from the changes I have made above.


Harald Alvestrand proposed the following modification:

>[log in to unmask] said:
>> This rewriting can be done either at the HTML layer (using "&" entity
>> references or numeric character references as defined in [HTML2]
>> section 3.2.1) or at the MIME layer (using Content-Transfer-Encoding
>> as defined in [MIME1] section 5).
>
>Suggested alternate text for the paragraph:
>
> The entity generating the HTML MAY choose to send non-ascii characters
> as themselves, in which case the document will use a MIME
> content-transfer-encoding, or it MAY choose to represent non-ascii
> characters using entity references or numeric character references
> as defined in [HTML2], in which case the document can be sent in the
> default "7bit" content transfer encoding.
>
>This avoids giving the impression that the "MIME layer" and the
>"HTML layer" are the "same kind of thing", I think.

The main thing is that we have three layers. Sending "characters as
themselves" is very unclear.


Regards,        Martin.

Back to: Top of Message | Previous Page | Main MHTML Page

Permalink



LISTSRV.NORDU.NET

CataList Email List Search Powered by the LISTSERV Email List Manager