LISTSERV mailing list manager LISTSERV 15.5

Help for MHTML Archives


MHTML Archives

MHTML Archives


View:

Next Message | Previous Message
Next in Topic | Previous in Topic
Next by Same Author | Previous by Same Author
Chronologically | Most Recent First
Proportional Font | Monospaced Font

Options:

Join or Leave MHTML
Reply | Post New Message
Search Archives


Subject: Re: New IETF draft for MHTML now ready!
From: Martin J Duerst <[log in to unmask]>
Reply-To:IETF working group on HTML in e-mail <[log in to unmask]>
Date:Thu, 18 Jul 1996 11:47:02 +0200
Content-Type:text/plain
Parts/Attachments:
Parts/Attachments

text/plain (198 lines)


Jacob Palme wrote:

>It was not possible for me to include all suggestions for changes
>from all readers, since some suggestions from different readers
>were contradictory. For example, at the Montreal meeting, I was
>strictly instructed *not* to use the word "encoding" for the
>HTML method of representing special characters with &-elements.
>But several of the suggested new texts after the meeting use
>exactly this word.

I wasn't at that meeting, but I see no problem in using that
word in that context, if it is done correctly.

>The two most controversial issues in the new draft is

>(b) Non-US-ASCII-character rewriting
>
>I include below in this e-mail message in full the new text
>on these issues.

Here is the text from Jacob, with some comments where I have
to clearly disagree, mostly because the layer of the MIME
"charset" parameter is ignored in several places.
After that, a full new proposal that is shorter follows.

>(b) Non-US-ASCII-character rewriting
>------------------------------------
>
>This item has been discussed in the list a lot, in spite of the
>fact that we all agree fully on the functionality. All the disagreement
>has been on which words are best used to express this functionality
>clearly and correctly.
>
>Here is the new text I have produced, it is a composite of portions
>of suggested new wording from Larry Masinter and Martin J. Duerst:
>
>! 11. Encoding Considerations for HTML bodies
>!
>! 11.1 Character set issues
>!
>! A mail user agent that is composing a message using HTML has a choice
>! in how to represent and subsequently encode characters for the
>! transmission of the mail message.
>!
>! However, there are some differences as to the default character
>! encoding, specified by the MIME "charset" parameter. If this parameter
>! is omitted: When transferred through HTTP, the default is [HTTP]:
>!           content-type: Text/HTML; charset=ISO-8859-1
>! When transferred via e-mail, the default is [MIME1]:
>!           content-type: Text/HTML; charset=US-ASCII
>!
>! To avoid confusion, the MIME Content-Type parameter for Text/HTML
>! SHOULD always include a charset value, and not rely on the MIME e-mail
>! default of US-ASCII if no charset value is specified.
>!
>! When sending HTML via MIME e-mail, three layers of encoding are
>! relevant as shown in Figure 1:

This is absolutely misleading. The step from characters to octets and
back, represented by the MIME "charset" parameter, is missing.
Please see my text some weeks ago.

>! Displayed text                       Displayed text
>!       |                                    ^
>!       V                                    |
>! +-------------+                      +----------------+
>! | HTML editor |                      | HTML viewer    |
>! |             |                      | or Web browser |
>! +-------------+                      +----------------+
>!       |                                    ^
>!       V                                    |
>! HTML markup                          HTML markup
>!       |                                    ^
>!       V                                    |
>! +------------------+                 +-------------------+
>! | MIME content-    |                 | MIME content-     |
>! | transfer-encoder |                 | transfer-encoder  |
>! +------------------+                 +-------------------+
>!       |                                    ^
>!       V              +-----------+         |
>! transfer-encoding--->| Transport |-->transfer encoding
>!                      +-----------+
>!
>!                         Figure 1
>!
>!
>! Definitions (see Figure 1):
>!
>! Displayed text   A visual representation of the intended text.
>!
>! HTML markup      A sequence of characters formatted according to the
>!                  HTML specification [HTML2].
>!
>! MIME encoding    A sequence of octets physically forwarded via e-mail,
>!                  may include MIME content-transfer-encoding as
>! specified
>!                  in [MIME1].
>!
>! HTML editor      Software used to produce HTML markup.
>!
>! MIME content-    Software used to encode  and decode non-US-ASCII
>! transfer-encoder characters according to the MIME standard.
>!
>! HTML viewer      Software used to display HTML documents to
>! recipients.
>!
>! Note: Real implementations need not split functions into different
>! modules as described above. The figure above is a logical model in
>! order to explain how rewriting and transport is done.
>!
>! If the displayed text contains non-US-ASCII characters, these
>! characters might have to be rewritten if the transport (as is common
>! in e-mail) is set to handle only 7-bit characters.

Absolutely wrong here. It's not a question of characters, it's a
question of octets. If you have charset="iso-2022-jp", you may
have tons on non-ASCII characters, but it's all 7 bits, and no
additional content-transfer-encoding is necessary.

>! HTML markup allows some characters at the displayed text level to be
>! represented using either entity references or numeric character
>! references (as defined in [HTML2] section 3.2.1).  For example, a
>! "small a, acute accent" may be represented by the entity reference
>! "&aacute;" or the numeric character reference "&#255;". Alternatively,
                255 is wrong, it should be 225. Also, it is better to use the
                official ISO name, which is "latin small letter a with acute".
>! the same character might appear directly in the HTML document, but for
>! transmission through MIME 7-bit-systems, the entire HTML document is
>! encoded using a Content-Transfer-Encoding (as defined in [MIME1]
>! section 5).

Again, the "charset" layer is missing!

>! In sending a message containing non US-ASCII characters, both these
>! rewriting methods MAY be used, and any mixture of them MAY occur when
>! sending the document via e-mail. Receiving mailers (together with the
>! Web browser they may use to display the document) MUST be capable of
>! handling any combinations of these rewriting methods.
>!
>! The value of the charset attribute of the Content-Type header field
>! should be US-ASCII if and only if the HTML markup contains only US-
>! ASCII characters (even if the displayed text contains non-US-ASCII
>! characters).
>!
>! Example of non-US-ASCII characters in HTML: See section 9.3 above.

>! 11.2 Line break characters

[none of my business, sorry]

==================
From here on follows my new proposal:
(please put in the appropriate references)

11. Encoding Considerations for HTML bodies

11.1 Character set issues

HTML [???] as an application of SGML [???] allows characters to be denoted
by character entities as well as by numeric character references
(e.g. "latin small letter a with acute" may be represented by
"&aacute;" or "&#225;").
HTML documents, in common with other documents of MIME content-type
text, can use various kinds of character encodings which are indicated
by the value of the "charset" parameter in the MIME content-type
header field. For the exact meaning and use of the "charset" parameter,
please see [RFC1521???].
Any documents including HTML documents that contain octet values
outside the 7-bit range need a content-transfer-encoding applied
before transmission over certain transport protocols [RFC1521???, p.13].

The above mechanisms are well defined and documented, and therefore
not further explained here. In sending a message, all the abovementionned
mechanisms MAY be used, and any mixture of them MAY occur when
sending the document via e-mail. Receiving mailers (together with the
Web browser they may use to display the document) MUST be capable of
handling any combinations of these mechanisms.

The present document does not make any further specifications except
for the default MIME "charset" parameter:
If this parameter, when transferred through HTTP, the default is [HTTP]:
           content-type: Text/HTML; charset=ISO-8859-1
but when transferred via e-mail, the default is [MIME1]:
           content-type: Text/HTML; charset=US-ASCII
To avoid confusion, the MIME Content-Type parameter for Text/HTML
SHOULD always include an appropriate charset value, and not rely on
defaults.


Comments:
- The three layers (HTML &...; "charset", content-transfer-encoding)
        are clearly mentionned.
- "characters" and "octets" are clearly separated.
- The text refers to other documents where these things are described,
        and explicitly says that it does not add something.
- The whole thing is less than half as long as the current text.

Hope this helps,        Martin.

Back to: Top of Message | Previous Page | Main MHTML Page

Permalink



LISTSRV.NORDU.NET

CataList Email List Search Powered by the LISTSERV Email List Manager