User:Beckley/Indexed Search Proposal

From MozillaWiki
Jump to navigation Jump to search
Draft-template-image.png THIS PAGE IS A WORKING DRAFT Pencil-emoji U270F-gray.png
The page may be difficult to navigate, and some information on its subject might be incomplete and/or evolving rapidly.
If you have any questions or ideas, please add them as a new topic on the discussion page.

This is a proposal for integrating/adding indexed search functionality in to the Thunderbird email client.

Rationale

The current Search Messages feature in Thunderbird is very slow for users that have a normal amount of accumulated messages. Each time a user initiates a search, the individual mailbox files are opened up and searched for the matching text. This can wind up taking tens of seconds, even minutes to complete. Users have become accustomed to Internet search engines which provide near instantaneous results, and if the entire Web can be searched that quickly then we can do the same for a user's mail store.

Providing instant results will require the use of a indexing engine. Most recent operating systems come with an indexing engine, or make it available as a free download. However, there are reasons why it would be beneficial to include an indexing engine as part of Thunderbird. We will look at both solutions here.

Using the OS Indexing Engine

Recent OSes have indexing engines included or available for download. Windows Vista comes with Windows Search, and it is available for XP as well. Mac OSX has had Spotlight for the last couple of versions now. For Linux there is Beagle, Tracker, Strigi, Recoll, and a number of others. There's even Google Desktop Search, which has versions for all 3 OSes.

OS-based desktop search provides its own user interface for searching, and is able to search email, as well as user documents, web browsing history, and other files stored on the user's computer. To get a good user experience between these services and Thunderbird some integration work is required. That integration has already been written for Spotlight, and is nearly complete for Windows Search. However, the user interface in OS-based desktop search is not optimized for searching email.

The OS-based desktop search components all have an API for programatically searching its indexed store. So one approach is to take advantage of that database when performing search inside of Thunderbird. The win for this route is that the index already exists, and doesn't have to be duplicated (the size of an index is generally around 25% of what the data that it indexes). The disadvantages are numerous, though:

  • Have to filter out other data in the index that isn't email
  • Each indexing engine has different capabilities, which leads to least-common-denominator solutions or differences between platforms
  • Not all OSes come with the indexing engine (Windows XP, Linux), and so requires user download and configuration
  • Glue code needs to be written/maintained to keep a single interface in the front-end (there are existing APIs that do this, Xesam is an example, but they don't support all of the indexing engines we would want)
  • The indexing engine can get disabled, have settings changed, or get upgraded to an incompatible state, making it unusable

Even though OS-based indexing engines are not well suited for search inside Thunderbird, they still are useful for users who want to perform search outside of Thunderbird and have their email show up in the results.

Incorporating an Indexing Engine

As mentioned in the section above, a better route to proceed is to include a indexing engine inside of Thunderbird. That way we can control it, and ensure that it is present, enabled, and compatible. There are a number of FOSS indexing engines available for use, but two in particular stand out: Lucene and SQLite Full Text Search.

Lucene

SQLite Full Text Search