LISTSERV mailing list manager LISTSERV 15.5

Help for MHTML Archives

MHTML Archives

MHTML Archives


Next Message | Previous Message
Next in Topic | Previous in Topic
Next by Same Author | Previous by Same Author
Chronologically | Most Recent First
Proportional Font | Monospaced Font


Join or Leave MHTML
Reply | Post New Message
Search Archives

Subject: Re: Revised text for draft-ietf-mhtml-spec
From: Martin J Duerst <[log in to unmask]>
Reply-To:IETF working group on HTML in e-mail <[log in to unmask]>
Date:Thu, 25 Jul 1996 11:02:00 +0200

text/plain (190 lines)

Jacob Palme wrote:

>JP: I have now revised the text according to proposals by Martin J Duerst,
>Jay  Levitt, Larry Masinter and Einar Stefferud. Here is the full text of
>those sections of the document which has been changed:

Here are my comments. Hope this helps.

>New text:
>-------- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
>11. Encoding Considerations for HTML bodies
>11.1 Encoding layers

Larry has pointed out some problems with the diagram.
Here are some others:
        - There is no introductory text, and no explanation of what
                the diagram serves for.
        - The diagram is purely informational, not specifying anything
                with respect to standardization. The whole section could
                be left out or moved to the informational document. This
                would save space and reduce this chapter to what is really
                necessary, as was suggested by various discussants.
        - The diagram is not in alignement with what is discussed in
                the next section. It may therefore be rather disturbing than
Conclusion: Leave out 11.1. [Alternative: Fix problems.]

>          Displayed text                        Displayed text
>               |                                     ^
>               V                                     |
>         +-------------+                       +----------------+
>         | HTML editor |                       | HTML viewer    |
>         |             |                       | or Web browser |
>         +-------------+                       +----------------+
>             |                                       ^
>             V                                       |
>         HTML markup                             HTML markup
>             |                                       ^
>             V                                       |
>  +---------+ +---------------+           +---------+ +---------------+
>  | MIME    | | MIME content- |           | MIME    | | MIME content- |
>  | encap-  | | transfer-     |           | encap-  | | transfer-     |
>  | sulator | | encoder       |           | sulator | | encoder       |
>  +---------+ +---------------+           +---------+ +---------------+
>    |              |                            ^              ^
>    V              V         +-----------+      |              |
>MIME heading + MIME content->| Transport |->MIME heading + MIME content
>                             +-----------+
>                               Figure 1
>Definitions (see Figure 1):
>Displayed text   A visual representation of the intended text.
>HTML markup      A sequence of octets formatted according to the
>                 HTML specification [HTML2].
>MIME content     A sequence of octets physically forwarded via e-mail,
>                 may use MIME content-transfer-encoding as specified
>                 in [MIME1].
>HTML editor      Software used to produce HTML markup.
>MIME content-    Software used to encode and decode non-US-ASCII
>transfer-encoder characters as specified in [MIME1].
>HTML viewer      Software used to display HTML documents to recipients.
>11.2 Character set issues
The title should preferably be something such as "character encoding issues".
The term "character set" correctly appears nowhere below.

This section needs an introductory sentence, such as:
"For the encoding of characters in an HTML document into a MIME-compatible
octet stream, the following three mechanisms are relevant:"

>- HTML [HTML2] as an application of SGML [SGML] allows characters to be
>  denoted by character entities as well as by numeric character references
>  (e.g. "latin small letter a with acute" may be represented by "&aacute;"
>  or "&#225;") in the HTML markup (see Figure 1).

What do we see in Figure 1? Where in Figure 1 should we look?
This will be understood even without Figure 1! The same applies
for the references to Figure 1 below.
>- HTML documents, in common with other documents of MIME content-type
>  text, can use various kinds of character encodings which are indicated
>  by the value of the "charset" parameter in the MIME content-type header
>  (MIME heading in Figure 1). For the exact meaning and use of the
>  "charset" parameter, please see [MIME1 section 7.1.1]. Note that the
>  "charset" parameter refers to the charset in the HTML markup (see Figure
>  1), not to the charset in the displayed text (see Figure 1). Thus, if
>  the HTML markup contains only US-ASCII characters, then the value of the
>  charset parameter should be US-ASCII, even if the HTML markup contains
>  entities which cause the displayed text to contain non-US-ASCII-
>  characters.
>- Any documents including HTML documents that contain octet values outside
>  the 7-bit range or that contain bare CRs or bare LFs need a content-
>  transfer-encoding applied before transmission over certain transport
>  protocols [MIME1, chapter 5] (MIME content in Figure 1).
>The above three mechanisms are well defined and documented, and therefore
>not further explained here. In sending a message, all the abovementioned
>mechanisms MAY be used, and any mixture of them MAY occur when sending the
>document via e-mail. Receiving mailers (together with the Web browser they
>may use to display the document) MUST be capable of handling any
>combinations of these mechanisms.
>Some transport mechanisms may specify a default "charset" parameter if
>none is suppled [HTTP, MIME1]. Because the default differs for different
>mechanisms, when HTML is transferred through mail, the charset parameter
>SHOULD be included, rather than relying on the default.
>Example of non-US-ASCII characters in HTML: See section 9.3 above.

This looks strange. What does section 9.3 speak about? Why is there
another place where the subject of HTML and charcters has to pop up?

>11.2 Line break characters
Should be 11.3 (unless, as I hope, 11.1 is canceled).

>The MIME standard [MIME1] specifies that line breaks in the MIME content
>(see Figure 1) MUST be CRLF. The HTTP standard [HTTP] specifies that line
>breaks in transported HTML markup (see Figure 1) may be either bare CRs,
>bare LFs or CRLFs. To allow data integrity checks through checksums, MIME
>content-transfer-encoding of line breaks SHOULD, if necessary, be used so
>that after decoding, the line break representation of the original HTML
>markup is returned.
>Note that since the mail content-MD5 is defined to a canonical form with
>all line breaks converted to CRLF, while the HTTP content-MD5 is defined
>to apply to the transmitted form. This means that the Content-MD5 HTTP
>header may not be correct for Text/HTML that is retrieved from a HTTP
>server and then sent via mail.
>--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
>JP: I am a little frightened by the fact that no one has commented on the
>decision at the editing meeting in Montreal to allow non-resolvable
>relative URIs in Content-Location headers. See the text below. Even if
>this is OK, I would like to hear that you think it is OK, so that I can be
>sure that you have just not read or considered this new text:
>New text:
>-------- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
>8.2 Use of the Content-Location header
>If there is a Content-Base header, then the recipient MUST employ relative
>to absolute resolution as defined in RFC 1808 [RELURL] of relative URIs in
>both the HTML markup and the Content-Location header before matching a
>hyperlink in the HTML markup to a Content-Location header. The same
>applies if the Content-Location contains an absolute URL, and the HTML
>markup contains a BASE element so that relative URL-s in the HTML markup
>can be resolved.
>If there is NO Content-Base header, and the Content-Location header
>contains a relative URL, then NO relative to absolute resolution SHOULD be
>performed (even if there is a BASE element in the HTML markup), and exact
>textual match of the relative URL-s in the Content-Location and the HTML
>markup is performed instead (after removal of LWSP introduced as described
>in section 4.4 above).
>The URI in the Content-Location header need not refer to an object which
>is actually available globally for retrieval using this URI (after
>resolution of relative URIs). However, URI-s in Content-Location headers
>(if absolute, or resolvable to absolute URIs) SHOULD still be globally
>Jacob Palme <[log in to unmask]> (Stockholm University and KTH)
>for more info see URL:

----  Martin J. Du"rst                            ' , . p y f g c R l / =
Institut fu"r Informatik                             a o e U i D h T n S -
der Universita"t Zu"rich                              ; q j k x b m w v z
Winterthurerstrasse  190                             (the Dvorak keyboard)
CH-8057   Zu"rich-Irchel   Tel: +41 1 257 43 16
 S w i t z e r l a n d     Fax: +41 1 363 00 35   Email: [log in to unmask]
 $@%F%e!<%k%9%H!&%^!<%F%#%s!&%d%3%V!J%A%e!<%j%C%RBg3X>pJs2J3X2J!K (J

Back to: Top of Message | Previous Page | Main MHTML Page



CataList Email List Search Powered by the LISTSERV Email List Manager