Multipurpose Internet Mail Extensions
Multipurpose Internet Mail Extensions or MIME (in Spanish "multipurpose internet mail extensions") are a series of of conventions or specifications aimed at the exchange over the Internet of all types of files (text, audio, video, etc.) in a transparent manner for the user. A significant part of MIME is dedicated to improving the transferability of text in different languages and alphabets. In a general sense, MIME extensions are aimed at admitting:
- Text in different character sets of US-ASCII;
- attachments not of text type;
- Message bodies with multiple parts (multi-part);
- header information with different character sets of ASCII.
Virtually all human-written email messages on the Internet, and a significant proportion of these automatically generated messages, are transmitted in MIME format over SMTP. Email messages on the Internet are so closely associated with SMTP and MIME that they are often called SMTP/MIME messages.
In 1991 the IETF began to develop this standard and since 1994 all MIME extensions are specified in detail in various official documents available on the Internet.
MIME is specified in six Requests for Comments (RFCs): RFC 2045, RFC 2046, RFC 2047, RFC 4288, RFC 4289, and RFC 2077.
The content types defined by the MIME standard are of great importance also outside the context of electronic messages. Examples of this are some network protocols such as HTTP on the Web. HTTP requires that the data be transmitted in the context of email-like messages, although the data may not be an actual email message.
Today, no email program or Internet browser can be considered complete if it does not accept MIME in its different facets (text and file formats).
Introduction
The Internet's basic electronic message transmission protocol supports only 7-bit ASCII characters (see also 8BITMIME). This limits email messages, as they include only enough characters to write in a small number of languages, mainly English. Other languages based on the Latin alphabet is additionally a fundamental component in communication protocols such as HTTP, which require that the data be transmitted as an email message although the data may not be an actual email message. Mail clients and mail servers automatically convert to and from MIME format when they send or receive (SMTP/MIME) email.
Type Naming
MIME assigns a name to each type of data. The names follow the following format:
type/subtype (type and subtype are character strings)
The type defines the general category of the data and the subtype defines a more specific type of that data. The type can contain the following values:
- text: Indicates that the content is plain text. Examples of subtypes: html, xml
- multipart: Indicates that it has multiple independent data parts. Examples of subtypes: form-data, digest
- message: To encapsulate an existing message. For example when we want to respond to an email message by incorporating the source message. Examples of subtypes: partial, rfc822
- image: It indicates that it is an image. Subtypes: png, gif
- audio: Indicates that it is an audio. Examples of subtypes: mp3, 32kadpcm
- video: Indicates that it is a video. Examples of subtypes: mpeg, avi
- application: Indicates that this is application data which can be binary. Examples of subtypes: json, pdf
MIME headers
MIME-Version
The presence of this header indicates that the message uses the MIME format. Its value is typically equal to "1.0" so this header appears as:
MIME-Version: 1.0
It should be noted that implementers have attempted to change the version number in the past and the change has had unforeseen results. At an IETF meeting held in July 2007 it was decided to keep the version number at "1.0" although many updates have been made to the MIME version.
Content-Type
This header indicates the type of media that the message content represents, it consists of a type: type and a subtype: subtype, for example:
Content-Type: text/plain
Through the use of the multipart type (multipart), MIME gives the possibility of creating messages that have parts and subparts organized in a tree structure in which the nodes leaf can be any non-multipart content type, and non-leaf nodes can be of any variety of multipart types. This mechanism supports:
- plain text messages using text/plain (this is the implicit value for the "Content-type:") header
- text plus attached files (multipart/mixed with a part text/plain and other non-text parties, for example: application/pdf for pdf documents, application/vnd.oasis.opendocument.text for OpenDocument text). A MIME message that includes an attachment usually indicates the original file name with a "Content-disposition:" or attribute name of Content-Type, so the type or format the file is indicated using both the content-type MIME header and the file extension (usually dependent on the SO).
Content-Type: application/vnd.oasis.opendocument.text; name="Carta.odt" Content-Disposition: inline; filename="Carta.odt"
- forward with the attached original message (multipart/mixed with a part text/plain and the original message as a part message/rfc822)
- alternative content, a message containing the text both in plain text and in other format, usually HTML (multipart/alternative with the same content in form text/plain and text/html)
- many other message buildings
Content-Transfer-Encoding
In June 1992, MIME (RFC 1341 is obsoleted by the new RFC 2045) defines a set of methods for representing binary data using ASCII text. The MIME header content-transfer-encoding: indicates the method that has been used. The RFC and the IANA list define the following values, which are not case sensitive:
- Suitable to use with SMTP:
- 7bit — supports up to 998 octets per code line; characters are in the range between 1..127 with CR and LF (codes 13 and 10 respectively) that can only appear as part of a CRLF line end. This is the implicit value for this header.
- Quoted printable — used to codify arbitrary sequences of octetos in a way that meets the rules of 7bit. It was designed to be efficient and in most cases readable to a human when used with text data that consist primarily in characters of the US-ASCII set and which also contain portions of bytes with values that are out of that range.
- base64 — used to codify arbitrary sequences of octetos in a manner that meets the rules of 7bit. It has a fixed overload when running the algorithm and has the purpose of being used with data other than text or text containing few values within the ASCII range.
- Suitable to use with SMTP servers supporting 8BITMIME SMTP extensions:
- 8bit — supports up to 998 octets per code line, the characters are in the range between 1..256 with CR and LF (codes 13 and 10 respectively) that can only appear as part of a CRLF line end.
- Suitable only for use with SMTP servers supporting the BINARYMIME SMTP extension (RFC 3030):
- binary - any sequence of octetos.
There is no explicitly defined encoding for sending arbitrary binary data over an SMTP transport with the 8BITMIME extensions. So base64 or quoted-printable (with their associated inefficiencies) still have to be used. These restrictions do not apply to other uses of MIME such as Web Services with MIME or MTOM attachments.
Encoded-Word
Since RFC 2822, the names and values of MIME message headers are always ASCII characters; values containing other types of characters must use the encoded-word or encoded-word (RFC 2047) syntax instead of literal text. This syntax uses an ASCII character string indicating the original character set (the "charset") and the content-transfer-encoding used to map the bytes from the original set to ASCII characters.
Its general form is:
=?
charset?
codification?
codified text?=
- charset can be any set of characters registered with IANA. Typically we will match the body charset of the message.
- codification can be: "
Q
"that denotes Q-encoding that is similar to quoted-printable encoding, or "B
"that denotes base64 encoding. - codified text is the text encoded with Q-encoding or base64.
Differences between Q-encoding and quoted-printable
The ASCII codes of the question mark (?) and the equals sign (=) cannot be represented directly since they are used as encoded-word delimiters. The ASCII code space reserved cannot be represented directly because it may cause older interpreters to unintentionally split the encoded-word. To make the encoding smaller and easier to read, the underscore symbol (_) is used instead of the space, creating the side effect that this symbol cannot be represented directly. The use of encoded-word in certain parts of the headers imposes further restrictions on which characters can or cannot be directly represented.
For example:
Subject: =?utf-8?Q?=C2=A1Hola,_se=C3=B1or!?=
is interpreted as:
Subject: Hello, sir!
The encoded-word format is not used for header names (eg Subject
). These header names are always in English. When the message is read with a mail client in a language other than English, the header names are translated by the client.
Multipart messages
A multipart MIME message contains a border in the "Content-type:" header; This border, which cannot appear in any of the parts, is placed between each one of them, and at the beginning and end of the message body, as shown below:
MIME-Version: 1.0 Content-type: multipart/mixed; boundary="frontier" This is a multi-part message in MIME format. -- border Content-type: text/plain This is the body of the message -- border Content-type: application/octet-stream Content-transfer-encoding: base64 PGh0bWw+CiAgPGhlYWQ+CiAgPC9oZWFkPgogIDxib2R5PgogICAgPHA+RXN0ZSBlcyBCBjdWVy cG8gZGVsIG1lbnNhamU8L3A+CiAgPC9ib2R5Pgo8L2h0bWw+Cic= - frontier...
Each part consists of its own content header (zero or more Content- header fields) and a body. Multipart content can be nested. The content-transfer-encoding header of a multipart type must always be "7bit", "8bit" or "binary" to avoid the complications imposed by the presence of multiple levels of decoding. The multipart block, as a whole, has no specification about the character set (charset); non-ASCII characters in part headers are handled via Encoded-Word, and part bodies can have specified character sets if applicable for their content type.
Notes:
- Before the first border there is an area that is ignored by email clients who support MIME. This area is usually used to put a message for older customers who do not support MIME.
- It is not until the time of sending the message that the mail client chooses a string of characters to use on the border between the parties, this allows to search for a string of text that does not match any portion of the body of any of the parties. This is typically implemented using a long chain generated randomly.
Multipart Subtypes
The MIME standard defines several subtypes for multipart messages, these specify the nature of the message part and its relationship to other parts. The subtype is specified in the "Content-type" for the entire message. For example, a multipart MIME message using the digest subtype will have a "Content-Type": "multipart/digest".
The RFC initially defines 4 subtypes: mixed, digest, alternate, and parallel. An application that is minimally compliant with the standard must support at least mixed and digest; the rest of the subtypes are optional. Other RFCs define additional subtypes such as: signed and form-data.
The following is a list of the most commonly used subtypes:
Mixed
Multipart/mixed is used to send messages or files with different "Content-Type" either online or as attachments. If images or other easily readable files are sent, most email clients will display them as part of the message (unless the "Content-disposition" header is specified differently). Otherwise they will be offered as attachments. The implicit content-type for each part is "text/plain".
Defined in RFC 2046, Section 5.1.3
Message
A message/rfc822 part contains an email message, including any headers. Rfc822 is a misnomer, since the message may be a full MIME message. It is also used for summaries of forwarding messages.
Defined in RFC 2046.
Digest
Multipart/digest is a simple way to send multiple text messages. The implicit content-type for each part is "message/rfc822".
Defined in RFC 2046, Section 5.1.5.
Alternative
The multipart/alternative subtype indicates that each part is an "alternative" of the same (or similar) content, each in different formats denoted by their "Content-Type" header. The formats are ordered according to how faithful they are to the original, with the least faithful at the beginning. Systems can choose the "best" representation that they are capable of processing; in general this will be the last part that the system understands, unless other factors may affect this behavior.
Since a client is unlikely to want to send a version that is not very faithful to the plain text version, this structure locates the plain text version (if it exists) first. This makes it easier for client users who do not understand multipart messages to read the messages.
What most commonly occurs is to use multipart/alternative for messages with two parts, one as plain text (text/plain) and one as HTML (text/html). The plain text part provides compatibility with older clients that are not able to understand other formats, while the HTML part allows using text formatting and links. Many email clients offer the user the ability to prefer plain text over HTML; this is an example of how local factors can affect how an application selects the "best" part of the message to display.
Although each part is intended to represent the same content, this is not required. Some spam filters examine only the text/plain part of a message because it is easier to analyze than the text/html parts. But spammers noticing this, started to create messages with a text/plain part that appears to be innocuous and include the blurb in the text/html part. The maintainers of anti-spam programs have modified their filters, penalizing messages with very different texts in a multipart/alternative message.
Defined in RFC 2046, Section 5.1.4
Related
The multipart/related subtype is used to indicate that message parts should not be considered individually but as aggregates of a whole. The message consists of a root part (implicitly the first) that references other parts, which in turn can refer to other parts. Parts are commonly referenced by the header: "Content-ID". The syntax of the reference is not specified but is dictated by the encoding or protocol used in the part containing the reference.
A common use of this subtype is to send entire web pages with images in a single message. The root part would contain the HTML document, which would use HTML image tags to refer to images stored in subsequent parts.
Defined in RFC 2387
Report
Multipart/report is a type of message that contains data that is formatted for interpretation by a mail server. It is between a text/plain (or some other type of easily readable content) and a message/delivery-status.
Defined in RFC 3462
Signed
The multipart/signed subtype is used to attach a digital signature to the message. This has two parts, a body part and a signature part. The entire body part, including the MIME headers, is used to create the signature part. There are many types of signatures, such as application/pgp-signature and application/x-pkcs7-signature.
Defined in RFC 1847, Section 2.1
Encrypted
A multipart/encrypted message has two parts. The first contains control information that is necessary to decrypt the second part, of type: application/octet-stream.
Defined in RFC 1847, Section 2.2
Form Data
As its name implies, multipart/form-data is used to express values submitted via a form. Originally defined as part of HTML 4.0, it is mostly used to send files via HTTP.
Defined in RFC 2388
Mixed-Replace (Experimental)
The multipart/x-mixed-replace content type was developed as part of a technology to emulate server push and streaming over HTTP.
All parts of a mixed-replace message have the same semantic meaning. However, each part overrides - "replaces" - to the previous part as soon as it is fully received. Clients should process the individual part upon arrival and should not wait for the entire message to finish.
Originally developed by Netscape, it is still supported by Mozilla, Firefox, Safari (but not Safari for iPhone) and Opera, but traditionally ignored by Microsoft.
Contenido relacionado
Uniquely decodable code
Motorola 68020
Turing test