Thunderbird:Pluggable Mail Stores

From MozillaWiki
Jump to: navigation, search

Current Mail Store

Currently, local mailboxes and offline news& IMAP stores use the berkeley mailbox format, a flat file in the Mbox family http://en.wikipedia.org/wiki/Mbox

The format is fine for adding messages quickly (seek to the end of the file, write the message), and for reading messages (seek to the offset in the folder of the message, read the message). Deleting messages is fast, because we just update a field in the header of the message. But there are a few limitations:

  • Reclaiming space for deleted messages involves copying all the non-deleted &messages to a temp folder and then back over the original (compaction)
  • We have a 4GB limit to mail folders because our message offset keys are 32 bit unsigned ints (though this could be fixed w/o abandoning the Berkeley mailbox format).
  • Indexers (e.g., Spotlight) don't handle berkeley mailbox format well
  • Incremental backup is a lot harder

Pluggable Mail Store Proposal

Because of these issues, and other reasons, people have wanted the ability to use other mail stores, e.g., MailDir - http://en.wikipedia.org/wiki/Maildir , or a sqlite database. Instead of hard-coding support for each mail store format in the code, what we'd like to do is define a mail store interface, and allow different mail stores to implement that interface.

The interface will need to support the following kinds of things:

  • stream a message to the store
  • read (stream) a message from the store, given a key
  • delete a message from the store
  • move/copy a message from one folder to an other
  • expunge/compact the store
  • reparse the store (maybe only need for berkeley mailbox?)

We will probably also need folder-level operations:

  • create a new folder/sub-folder
  • enumerate sub-folders
  • delete a folder
  • move/copy/rename a folder

We will then need to change all the code that assumes a mailbox store corresponds to a berkeley mailbox to instead use the pluggable interface. Off the top of my head, here's a list of some of those places:

http://mxr.mozilla.org/mozilla/source/mailnews/local/src/nsMailboxProtocol.cpp - we read for display and copy of messages from here. In particular, see nsMailboxProtocol::OpenFileSocketForReuse and nsMsgProtocol::OpenFileSocket (in the base class, http://mxr.mozilla.org/mozilla/source/mailnews/base/util/nsMsgProtocol.cpp), which creates a transport and input stream from the mailbox file.

http://mxr.mozilla.org/mozilla/source/mailnews/db/msgdb/src/nsMailDatabase.cpp (where we tweak the x-mozilla-status flags, and also verify that the mailbox timestamp&size are consistent with the information stored in the .msf file)

http://mxr.mozilla.org/mozilla/source/mailnews/local/src/nsLocalMailFolder.cpp - Not surprisingly, this is where a lot of the folder operations happen. We iterate over the server directory to discover the folder hierarchy. Rename/Move/Delete are all handled here.

http://mxr.mozilla.org/mozilla/source/mailnews/local/src/nsPop3Sink.cpp - this is where pop3 download happens. It knows that we store pop3 messages in a big file.

http://mxr.mozilla.org/mozilla/source/mailnews/local/src/nsParseMailbox.cpp - this is where we parse local mail folders. It obviously has a lot of knowledge about the mail folder format.

http://mxr.mozilla.org/mozilla/source/mailnews/base/util/nsMsgDBFolder.cpp - see nsMsgDBFolder::GetOfflineFileStream and friends for the code that knows that our offline stores are berkeley mailboxes.

I don't know if pluggable stores would hide behind a single interface, or multiple interfaces, e.g., one for dealing with messages, one for manipulating folders. There is a natural separation there but is it a useful complication?

Would we allow different stores for different accounts, or would you have to decide globally what kind of store you wanted?

We store flags and keywords in headers in the actual mail message store, and rebuild indexes from those headers. Would we want to do the same thing with other mail stores? I can see doing it with maildir, but not sqlite. This is strongly related to what rebuild index does for a particular mail store.

Pluggable mail interface is implemented in Version 12 by bug 402392.

Followup work continues in bug 845952 aka "finish "maildir" message storage [meta]".

Straw man Interface

interface nsILocalMailStore : public nsISupports {

 void initWithServerPath(in nsILocalFile aServerPath);
 nsISimpleEnumerator getSubFolders(in nsIMsgFolder aParentFolder);
 void deleteFolder(in nsIMsgFolder aFolder);
 void renameFolder(in nsIMsgFolder aFolder, in AString aNewName);
 void copyFolder(in nsIMsgFolder srcFolder, in nsIMsgFolder dstFolder, in boolean isMoveFolder,
                 in nsIMsgWindow msgWindow, in nsIMsgCopyServiceListener listener);
 
 // We need to know what the hdr for the new msg will be.
 nsIOutputStream getNewMsgOutputStream(in nsIMsgFolder aFolder, out nsIMsgDBHdr aNewHdr);
 nsIInputStream getInputStream(in nsIMsgDBHdr aHdr);
 void deleteMessage(in nsIMsgDBHdr aHdr);
 // should this take an array? a copy listener?
 void copyMessage(in boolean isMove, in nsIMsgDBHdr aHdr, in nsIMsgFolder aDstFolder);
 void deleteMessage(in nsIMsgDBHdr aHdr);
 attribute boolean needsCompaction;
 void compactFolder(in nsIMsgFolder aFolder);
 void rebuildIndex(in nsIMsgFolder aFolder);
 void setFlags(in nsIMsgDBHdr aMsg, in unsigned long flags);
 void setKeywords(in nsIMsgDBHdr aMsg, in string keywords);

}

Proving the design

At a minimum, I'd want to convince myself that MailDir and a Sqlite message store could live behind the pluggable interface, as well as our current Berkeley mailbox format store. After designing the interface, the next step would be to move the Berkeley mailbox implementation behind the pluggable interface. Then, find someone to do prototypes of Maildir and a Sqlite (or other db) implementation.

What would stay the same

In order to avoid changing the whole world, we will want to keep the same folder uri syntax, and the same 32 bit message keys.

MSF File locations will probably stay the same (I hope that's compatible with the MailDir format). If it's possible that some mail store needs the .msf files to be somewhere else, then we might need a method to map a folder to .msf file path.

Note this says nothing about the header database itself. Any pluggable store should be able to work with the Mork databases, or its replacement. If we decide to go to more of a global database approach, the pluggable stores should not care. In other words, the pluggable store abstraction is separate from the database abstraction.

Other considerations

If the pluggable store is mail-dir like, Spotlight and Windows Search wouldn't need to write separate copies of each message. But the Spotlight/Windows Search code would need some way of knowing that the pluggable store is storing each message in a separate file.

Implementing a Pluggable Store

I expect that as other pluggable stores are implemented, the interface will evolve somewhat to take into account different characteristics of stores. There is also more that can be done to reduce the amount of work the stores have to do (e.g., if a store implements CopyMessages, it has to add undo actions and notify the nsIMsgFolderNotificationService). So please don't assume the interface is frozen, or the way it is called from the core code is set in stone. As more stores are implemented, of course, it will be harder to change the interface.

The two current stores are both local, non-cloud, and fairly synchronous, so adding cloud-based or even async db based stores will probably expose issues in the interface and/or the way the core code is using the interface. I believe a cloud-based store will need to do some local caching in order to be performant. All pluggable store interface methods are called from the UI thread, so they shouldn't block for network I/O. For example, the discoverSubFolders method should enumerate cached folders, and if new folders are discovered in the cloud, that should happen in the background and those folders get added after the fact, similar to what we do for IMAP folder discovery.

The getMsgInputStream method should return an input stream w/o trying to talk to the server first, and the connection to the server should be established when the core Thunderbird code first tries to read from the stream, because the code that reads from the stream doesn't block the UI thread.

However, the code that writes to a msgOutputStream is called from the UI thread, so those writes would need to be buffered locally and uploaded to the cloud in the background.

It's certainly conceivable that if cloud stores become popular, the core code could be changed to do more stuff on non-UI threads, but we don't currently have resources to work on that.

Migration between Pluggable Stores

Pluggable Store Migration addresses the issues we'll need to address while migrating from one mailbox store to another.

Other Pluggable Stores