User:Rkent/Folder Data Persistence

From MozillaWiki
Jump to: navigation, search

Intro

The point of this page is to record notes on understanding data persistence of information associated with folders. This is complicated because the information is frequently stored in multiple in-memory objects, as well as in two different database: dbFolderInfo which is a table of the main folder message summary database, as well as panacea.dat which is a cache of the same information.

These notes were prepared to understand proposals in bug 1032360: "nsMsgLocalMailFolder::GetSizeOnDisk seems to return wrong value for maildir store" to persistently store the folder size, but should be applicable to other issues of folder persistence.

Notes

Folder Cache (panacea.dat)

The folder cache file, named "panacea.dat", is a profile-wide file that contains summary information for each folder. Its main purpose is to allow the folder pane to display the folder tree and associated folder information (such as unread count) without having to open the mork summary file for each folder to get that information, which would take both excessive time and memory. The canonical source for folder metadata though is the dbFolderInfo object that is stored within the mork summary file, so this plan creates a sync issue between these files, as well as with associated memory objects owner by the folder.

(The naming of panacea.dat is in nsMailDirProvider.cpp for the leaf name, but NS_APP_MESSENGER_FOLDER_CACHE_50_FILE is used to access the full path to that file).

panacea.dat is accessed using nsIMsgFolderCacheElement, which uses a string key to access data for a folder. The string key is defined in nsMsgDBFolder::GetFolderCacheKey using the persistent path of the .msf mork file for the folder (for non root-folders) or the main folder (for the root folder). Examples:

C:\\Users\\Kent\\AppData\\Roaming\\Thunderbird\\Profiles\\3gtxygma.default\\ImapMail\\mail.caspia.com\\Sent.msf

C:\\Users\\Kent\\AppData\\Roaming\\Thunderbird\\Profiles\\3gtxygma.default\\ImapMail\\mail.caspia.com

panacea.dat is owned by nsMsgAccountManager, which has methods to acquire the folder cache object and write to it.

Updating the Folder Cache

Here is an example of how the Folder Cache gets updated (along with dbFolderInfo) in a sample operation. The sample used is copying an unread message from an IMAP folder to a local folder, which will require updating the unread count and the total count for the local folder.

Stack to write to local folder cache element (using esr31):

>	xul.dll!nsMsgLocalMailFolder::WriteToFolderCacheElem(nsIMsgFolderCacheElement * element) Line 1088
 	xul.dll!nsMsgDBFolder::WriteToFolderCache(nsIMsgFolderCache * folderCache, bool deep) Line 1386
 	xul.dll!nsMsgDBFolder::FlushToFolderCache() Line 1362
 	xul.dll!nsMsgDBFolder::UpdateSummaryTotals(bool force) Line 4178
 	xul.dll!nsMsgDBFolder::EnableNotifications(int notificationType, bool enable, bool dbBatching) Line 5085
 	xul.dll!nsMsgLocalMailFolder::EndCopy(bool aCopySucceeded) Line 2447
 	xul.dll!nsCopyMessageStreamListener::EndCopy(nsISupports * url, tag_nsresult aStatus) Line 114
 	xul.dll!nsCopyMessageStreamListener::OnStopRequest(nsIRequest * request, nsISupports * ctxt, tag_nsresult aStatus) Line 144
 	xul.dll!nsImapCacheStreamListener::OnStopRequest(nsIRequest * request, nsISupports * aCtxt, tag_nsresult aStatus) Line 8629

The method UpdateSummaryTotals is key to keeping the folder cache current, but the way information is managed there is quite convoluted. It does the following things:

  • Call ReadDBFolderInfo(force), which makes sure that the folder object member variables have been initialized from the cache. If force == true, or the initialization from the cache fails, then the variables are initialized from dbFolderInfo instead. So this method is badly misnamed, it should be something like "InitializeFolderMetadata". Because force==true will cause the folder db to be opened, it is important that force==true is only used in cases where the db is already open, or we expect it to be opened. If force==false, except at initialization ReadDBFolderInfo is a noop.
  • Sends OnItemIntPropertyChanged notifications to nsIFolderListener objects for the unread count and total count for the folder
  • Writes updated information to the folder cache

If force==true, then things are really confusing and convoluted. In ReadDBFolderInfo, certain folder metadata (mNumTotalMessages, mNumUnreadMessages, mExpungedBytes, mName, mCharset, mCharsetOverride, nsMsgFolderFlags::GotNew) are read from dbFolderInfo (overwriting any local values in the folder object). Cache element metadata mNumPendingUnreadMessages, mNumPendingTotalMessages, mFolderSize, mFlags is not. The handling of pending counts is particularly confusing. They seem to be mostly managed through the method ChangeNumPending... which updates mNumPendingUnreadMessages and then updates dbFolderInfo but not the cache, as well as notifies. (This seems like a performance issue. You should not have to open a folder database to record that there are messages pending that have not been downloaded).

How to understand all of this? The philosophy seems to be the following:

1) Any time that folder metadata is changed, that change needs to be written immediately to dbFolderInfo. 2) dbFolderInfo is maintained by the db, so adding a message header will implicitly update dbFolderInfo. 3) db operations do message db listener notifications, but not folder-level notifications. folderCache is considered a folder-level notification, so is done by the folder.

Updating unread count in dbfolderinfo

Stack to update unread count:

	xul.dll!nsDBFolderInfo::ChangeNumUnreadMessages(int delta) Line 517
 	xul.dll!nsMsgDatabase::AddNewHdrToDB(nsIMsgDBHdr * newHdr, bool notify) Line 3508
 	xul.dll!nsMsgLocalMailFolder::EndCopy(bool aCopySucceeded) Line 2378
 	xul.dll!nsCopyMessageStreamListener::EndCopy(nsISupports * url, tag_nsresult aStatus) Line 114
 	xul.dll!nsCopyMessageStreamListener::OnStopRequest(nsIRequest * request, nsISupports * ctxt, tag_nsresult aStatus) Line 144
 	xul.dll!nsImapCacheStreamListener::OnStopRequest(nsIRequest * request, nsISupports * aCtxt, tag_nsresult aStatus) Line 8629

So nsCopyMessageStreamListener is managing the calls. EndCopy is called first, which adds the message to the database, resulting in incrementing numUnreadMessages in dbFolderInfo

Updating folderSize in dbFolderInfo

Stack:

	xul.dll!nsDBFolderInfo::SetFolderSize(unsigned __int64 size) Line 423
 	xul.dll!nsMsgBrkMBoxStore::SetSummaryFileValid(nsIMsgFolder * aFolder, nsIMsgDatabase * aDB, bool aValid) Line 301
 	xul.dll!nsMailDatabase::SetSummaryValid(bool aValid) Line 126
 	xul.dll!nsMsgLocalMailFolder::OnCopyCompleted(nsISupports * srcSupport, bool moveCopySucceeded) Line 1352
 	xul.dll!nsMsgLocalMailFolder::EndCopy(bool aCopySucceeded) Line 2455
 	xul.dll!nsCopyMessageStreamListener::EndCopy(nsISupports * url, tag_nsresult aStatus) Line 114
 	xul.dll!nsCopyMessageStreamListener::OnStopRequest(nsIRequest * request, nsISupports * ctxt, tag_nsresult aStatus) Line 144
 	xul.dll!nsImapCacheStreamListener::OnStopRequest(nsIRequest * request, nsISupports * aCtxt, tag_nsresult aStatus) Line 8629

Analysis

As a general rule, changes to folder metadata is first written to dbFolderInfo without doing changes on the equivalent member variables in the msgFolder. The variables in the msgFolder are changed at the end of operations, reading from dbFolderInfo in ReadDBFolderInfo.

ReadDBFolderInfo(force)

nsMsgDBFolder::ReadDBFolderInfo(bool force) when force==false is a no-op except for the first time a folder is created, where the folder member objects for folder metadata are initialized from the folder cache.

ReadDBFolderInfo(false) is used typically in a Get...() call, where you want to make sure that the variable has been initialized from the cache before returning it.

ReadDBFolderInfo(true) is used typically after folder metadata has changed in dbFolderInfo, and you want to update the folder member variables, typically also doing any required notifications. See nsMsgDBFolder::UpdateSummaryTotals

UpdateSummaryTotals(force)

In all cases, UpdateSummaryTotals will:

  • Initialize member variables using ReadDBFolderInfo(force)
  • Notify changes in kTotalMessagesAtom and kTotalUnreadMessagesAtom
  • call FlushToFolderCache to update the cache.

When force==true, then the objects are read from dbFolderInfo, so this is the method used to update relevant member variables by reading from dbFolderInfo, doing folder-level notifications, and flushing the changes to the folderCache.

When force==false the situation is trickier. Member variables are initialized from the folder cache, with notifications, without opening the database. This is done when the folder object is first initialized (in GetSubfolders), or in SummaryChanged().

SummaryChanged()

This is just a synonym for UpdateSummaryTotals(false), and is only used in IMAP. Its main purpose seems to be to write folder metadata to the folder cache. This assumes that the metadata was written to dbFolderInfo at the time it was changed. One common use seems to be after ChangeNumPending... which does changes in dbFolderInfo and notifications but not in folderCache. The call to UpdateSummaryTotals(false) only flushes the changes to the folder cache.

folderSize and IMAP databases

Overloaded meaning and definition of folderSize

One of the things that makes folderSize so complex is that its meaning is overloaded, with at least three different uses:

  • On local folders with mbox, folderSize is used to report to the UI the size of the message folder on disk, as well as used as an indicator of whether the summary file is valid. Although these values are the same, the timing issues on updates for these two issues may be different.
  • For maildir, calculating folderSize is slow directly from the disk, but it is not used as an indicator of validity of the message summary file.
  • On IMAP, the summary file is still used with offline folders, but the meaning of folderSize is changed to represent the server-side storage used for messages. For this reason, folderSize is unavailable for use with offline folders to represent the validity of the summary file.

How does IMAP use the summary file and mbox storage, and avoid the paths that check for summary file valid?

For a local folder with mbox, toggling a message as read results in marking the summary file valid through this stack:

	xul.dll!nsMailDatabase::SetSummaryValid(bool aValid) Line 119
 	xul.dll!nsMailDatabase::EndBatch() Line 78
 	xul.dll!nsMsgDBFolder::EnableNotifications(int notificationType, bool enable, bool dbBatching) Line 5085
 	xul.dll!nsMsgDBView::ApplyCommandToIndices(int command, unsigned int * indices, int numIndices) Line 2932
 	xul.dll!nsMsgDBView::CycleCell(int row, nsITreeColumn * col) Line 2067

The same operation on IMAP has an overridden EndBatch which is a no-op:

	xul.dll!nsImapMailDatabase::EndBatch() Line 71
 	xul.dll!nsMsgDBFolder::EnableNotifications(int notificationType, bool enable, bool dbBatching) Line 5085
 	xul.dll!nsMsgDBView::ApplyCommandToIndices(int command, unsigned int * indices, int numIndices) Line 2932
 	xul.dll!nsMsgDBView::CycleCell(int row, nsITreeColumn * col) Line 2067

But also, SetSummaryValid for IMAP is different, without the folderSize test:

NS_IMETHODIMP	nsImapMailDatabase::SetSummaryValid(bool valid)
{
  if (m_dbFolderInfo)
  {
    m_dbFolderInfo->SetVersion(valid ? GetCurVersion() : 0);
    Commit(nsMsgDBCommitType::kLargeCommit);
  }
  return NS_OK;
}

Where do we use summaryValid?

So for non-local folders, this is only used when opening the folder.

Conclusions for maildir folderSize

  • maildir needs to send folderSize updates to the msgFolder, but IMAP needs to ignore those. That means that IMAP needs to override SetSizeOnDisk with a noop. Perhaps it needs instead SetSizeOnServer which it will use instead.