E-mail has evolved through time, just like HTML. But while there were only a small number of browsers and they banded together to try and standardize what was going on, e-mail has never had that. More clients, more buggy implementations, more corners cut in development and apparent issues prioritizing fixes, etc.
The low level message format is standardized, but almost everything else is by convention which is frequently violated. E-mail clients just try and work with what they've got.
Mail messages E-mail messages are structured in a hierarchy. Each piece of the message is a 'part'. Each part has a type, such as:
- text/plain: Just text! No fancy formatting. Encoding can affect line-wrapping (search for "format=flowed").
- text/html: HTML! Clients vary widely in terms of what they support either because of layout engine limitations or intentional limitations. The lowest common denominator usually involves using tables and setting explicit "style" attributes all over the place because "style" elements are frequently not honored.
- multipart/alternative: Used by clients sending a text/html part that are worried that the receiving client won't know how to show it. A text/plain part and text/html part are enclosed together.
- multipart/related: Lets a text/html part refer to other parts using the "cid:" protocol and referencing the "content-id" of those parts. This is how embedded images are done.
- multipart/mixed: There is a display part and some attachments.
There is a very nice wikipedia article at
- RFC 5322: Internet Message Format obsoletes rfc2822 and rfc822.
- RFC 2045: MIME Part One: Format of Internet Message Bodies - Encoding non-ASCII stuff: content-transfer-encoding header, content-type header parameters, quoted printable encoding, base64 encoding, content-id header
- RFC 2046: MIME Part Two: Media Types - leaf content types (text/*, image/*, audio/*, video/*, application/*), composite types (multipart/*, message/*), content-type charset parameter, boundary delimiters, multipart/mixed, multipart/alternative, multipart/digest, multipart/parallel (deadish), message/rfc822, message/partial (deadish), message/external-body (deadish)
- RFC 2047: MIME Part Three: Message Header Extensions for Non-ASCII Text - MIME Encoded Words. The things that look like =?ENCODING?Q?gibberish?= or =?ENCODING?B?base64gibberish?=
- [http://tools.ietf.org/html/rfc2231 RFC 2231:
MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations] - Parameter encoding of long-values with character set encoding and language-expressing features (continuations).
- RFC 2387: The MIME Multipart/Related Content-type - content-type start parameter, content-type start-info parameter, content-type type parameter (redundant weirdness), content-disposition header
- RFC 1847: Security Multiparts for MIME: Multipart/Signed and Multipart/Encrypted - S/MIME, multipart/signed, multipart/encrypted
- RFC 3156: MIME Security with OpenPGP - Hate S/MIME? Try PGP!
The convention is to use greater-than symbols (with or without separating whitespace between them) to indent the message being replied to. This is usually preceded by a string indicating the name of the person being replied to and the timestamp of the message. That string is usually localized and can vary wildly from client to client, making it harder to reliably distinguish from surrounding text.
There is much less convention here. Ideally, quoted messages are placed in "blockquote" tags, but sniffing and inference may need to be done based on CSS classes or styles applied to nodes. Things can frequently be complicated by the fact that usually the quoting pass generates the HTML and then just hands things off to an HTML editor widget. If the user replies inline, this can complicate things because their comments can be inside a blockquote, and may not be delimited with an explicit class.
In general, we have not done a very deep investigation of all the possibilities here. The best resource we have is that the Thunderbird Conversations extension has done some work here that we can reuse.
HTML Message Types
There are broadly several types of HTML messages out there
- Messages written by humans for day-to-day correspondence.
- Messages written by average humans that are trying to make a pretty newsletter.
- Automated HTML messages, like the invoice a web store sends to you when you buy something.
- Professional advertising. This can take the form of a big graphic that just gets sliced up like an old-school image map or a catalog.
Advertisers and others are very interested in knowing if you read an e-mail, if you click on a link in an e-mail, et cetera. They can know if you are reading an e-mail if we open any network connection as part of the process of displaying the message.
The original track approach to put a 1x1 "web bug" style image tag in an HTML mail that included some unique id about the user. However, if any remote resource is loaded, it's very straightforward to encode identifying information about the user into the URL. For reasons of laziness / the arms war hasn't gotten that far, many HTML mails will host external images without identifying information and then use separate images for tracking purpose. (This does make it easier to use caching services.) Links will almost always include tracking information in professional advertising and human-made-newsletters will at least use link shorteners.
The only real defenses against this type of information leakage are to:
- Not display external images by default. Only white-list specific senders. Because tracking domains may differ from the image serving domains, it is possible to not whitelist everything.
- Fetch and cache images automatically. This can improve the user's experience (at a bandwidth and storage cost) because the images will already be local. The upside to the user is that the tracker loses the ability to know when the user is actually looking at the message. Although it's also likely the tracker would misunderstand until such a strategy gains widespread adoption.
- Pierce link-shortened URLs. Similar to pre-fetching images, URLs could be pre-resolved, but only when they are believed to be safe. ('Remove me' links would be a bad thing to automatically trigger.)
An e-mail message should never be able to take over your e-mail app or your computer. For traditional native e-mail clients, the original fear was just that a specially crafted e-mail could cause a buffer overflow, allowing the exploitation of the e-mail app and any privileges it might have. Now that mail apps are written using HTML and JS, there is the added risk of a mail message being able to gain access to those privileges. While the compromise of just the e-mail app is not as dramatic as complete compromise of the system it runs on, access to a user's private e-mail, passwords, etc. still constitutes a very serious breach.
The primary risk from our perspective is inserting attacker-controlled HTML code into our DOM, such as via innerHTML. The usual approaches for dealing with HTML are either to place the HTML in a sandbox where its privileges are limited and/or using some type of white-list based sanitization to only let things through that we know are safe. Capability-based privilege limitation is the preferred form of protection. That's how Thunderbird does it by default; it provides an nsIContentPolicy implementation that prevents messages from fetching remote resources (by default, this can be disabled or specific users white-listed) and forbids execution of JS code. Thunderbird also has some sanitizer logic, but it is only used when the "View..." "Message Body As..." "Simple HTML" option is used.
Text Message Display
We have a reasonably sophisticated parsing routine that attempts to go through a message and figure out what is quoted and how quoted it is, what was actually written by the author of the message, what's a signature or legal boilerplate, etc. One nicety is that we attempt to normalize whitespace between lines of different types. So if you've ever noticed in your mail client that some people do not put whitespace between the message they're replying to and their own comment while some people do, our quoting analysis is able to generate the same output from either way so that the presentation details are left up to our CSS rather than how obsessive the author of the message is about quoting. You can see the various constants and comments in quotechew.js
When rendering the message to the screen, we create separate div blocks for each run of lines of the same type and assign distinct CSS classes to those blocks.
HTML Message Display
The main problem we face is that most HTML mail where the author is trying to make it look pretty is written assuming a display width of at least 600 (CSS) pixels. Our phone devices are tiny and narrow and we only display messages in portrait. So the decision we are faced with when we display a message is whether the message needs to be wider than our display, or whether we can resize it to make it narrower.
As such, we have two modes: newsletter mode, and non-newsletter mode. When we are displaying a message, we currently create an iframe without size constraints and see how wide the iframe wants to be. If it wants to be wider than our screen, we put it in newsletter mode which enables zooming and panning and set us to be initially zoomed out so the entire width of the message is visible.
Note that we are currently unable to treat the message like a web page. You may notice that the web browser has fantastic asynchronous pan and zoom and can even do reflows-on-double tap to try and make text more visible. We can't do that right now for platform reasons. We're stuck with using the 'transform' CSS attribute to do panning and zooming ourselves. And this can have serious performance ramifications when the message is very long.
A complicating factor with HTML message display is that the authors of some messages may not include sufficient sizing information on images or their enclosing tables. The result is that when we eventually load embedded or external images, the dimensions of the HTML mail may change and a message that was not previously categorized as a newsletter may now want to be a newsletter. There's currently a bug about us failing to make the transition correctly.
Attachments are saved to DeviceStorage when you download them. If we have trouble saving the file with the name it came with, we will append a timestamp to the file and try again. If that fails, we give up.
We never delete the attachment from DeviceStorage. When the message that owns them is removed from our offline store/cache, the attachment sticks around. If we re-download the message again in the future and the user decides to re-download the message again from our UI, not realizing that the message has already been downloaded, we will re-download the attachment and end up appending a timestamp to its name.
We could probably handle all of this better if we gave it some thought. Because servers don't provide us with pre-computed hashes of content, network traffic might still have to happen, but there is potential for us to at least avoid redundantly storing files. Or we could implement a sandboxed work-flow where downloaded files are stored in an IndexedDB cache area until they are explicitly promoted to be globally visible on the device via DeviceStorage.
Embedded images are saved to IndexedDB Blob storage rather than DeviceStorage. Embedded images are deleted when the message they are associated with is removed from our local offline store/cache.
Attached images are a special case, and not one that we really handle well right now. What the user wants is probably to be able to see a thumbnail of the image, and the also be able to view the image in the large. But without participation of the server, the only way we can get a thumbnail is to download the entire image and make one ourselves. (Well, we might be able to just download part of the image if it's stored in a progressive fashion or has an embedded thumbnail...) But if whatever is sending the message is trying to force us to show the image inline, we may end up displaying the image at its full resolution and invoking newsletter mode. That's bad for performance, and currently we have not implemented saving an embedded image to disk, so the image gets stuck in the e-mail app.
This will require better heuristic handling on our part and the implementation of being able to save embedded images.
We use an HTML sanitizer with a white-list to only allow 'safe' HTML through, but are backstopped by the HTML "iframe" tag sandbox attribute and our app's CSP (Content Security Policy). Unfortunately, there is currently no equivalent of gecko's nsIContentPolicy for iframe sandboxes to allow us to forbid network access. (Although there is interest among the security team in trying to standardize such a feature.)
Because of this, our sanitizer has to render tags/style inert that would otherwise trigger network fetches. This in turn means that in order for us to later "activate" them, we need to manipulate the DOM or entirely re-write the document from scratch. The activation technique requires that the page live in our origin so we can manipulate it. (Unlike Firefox and Thunderbird code which run with chrome privileges, allowing them to manipulate the contents of displayed pages, the e-mail app runs with standard content privileges, so we would be just as privileged as the displayed e-mail or any otherw website.) This in turn means that the e-mail is affected by our CSP. Re-writing the document would have layout impact as well as requiring us to have an efficient way to produce both an inert and activated document. (Store diff hunks?)
This primarily means that we don't let through:
- script: No JS!
And we render inert, possibly by nuking:
- img tags and style directives that could fetch remote images
- external resources that aren't standard for HTML e-mails like remote styles, etc.
- href attributes that might theoretically cause pre-fetches
See htmlchew.js for more info.