Places:AutoComplete

From MozillaWiki
Jump to: navigation, search

(Please comment by clicking "discussion" above, and not by editing this page.)

The Places History AutoComplete uses a SQL database to find pages to display results to the user for a given search query. This page describes how results are processed and given to the front-end (e.g., Firefox 3) and various optimizations to speed things up.

Basic Work Flow

The general outline of how a search query incrementally finds results is as follows:

  1. Initialize data structures
  2. Process search query
  3. Search database and save matches
  4. Update search status
  5. Continue searching if necessary

Initialize data structures

Searches are started through the nsIAutoCompleteSearch::StartSearch interface, where we get the search query and a listener to give results. Here we standardize the query internally by trimming spaces, switching to lowercase, etc. to simplify matching it against other queries.

Here we initialize data to store the listener, new results, and information about bookmarks/livemarks. We additionally reset previous search information like the status of the search, search position/offset, and matching urls.

Process search query

The search query is a plain string that can be split into multiple words by spaces. Typically this is for multi-word searching, but special tokens are also processed to change the behavior of the search.

As the query is split into individual words/tokens, the special tokens are recorded as changing the behavior and removed from the query, so we don't try matching it against the title/URL.

Search database and save matches

The bulk of the history processing is in this stage, and it consists mostly of executing a SQL query and filtering results. There's also some special behavior for the first chunk for adaptive learning and keyword searches.

Query database

Queries are actually fairly simple in that they blindly fetch all results that fall within the current chunk (a fixed chunk size and an updating chunk offset). One major aspect of the results is that they come in frecency order, where we've precomputed the ordering ahead of time and the database has an index on that column.

Filter results

The filtering of results is not handled by SQL as we do fairly complex matching against titles/tags/URLs with varying behavior specified by the user's preferences.

For each page returned by the database query, we get a number of properties/attributes such as page title, URL, tags, visit count, etc. For a page to match a query, it needs to have all tokens match somewhere in the title, URL, or tags. The filtering supports matching query tokens on word boundaries, anywhere, and at the beginning of each of those properties depending on the preferences.

Additionally, special tokens and behavior preferences can additionally filter results by forcing url matches or allowing only visited pages. These filters are also processed while looking at each search token against each page.

Store matches

Once a match is determined, we package it into a result that has the URL, title, favicon, and style. The title is overloaded to additionally hold the tags of the page by concatenating it to the page/bookmark title and separated by an endash.

The style of the page is to let the front-end know if it's a plain page, bookmarked page, etc.

Extra first-chunk queries

The previous sections described the general query and filtering process focusing on a full-history search that happens for every chunk. There are a couple additional queries that use the same structure as above, but provide a different set of results.

The first is adaptive learning that shows pages that users have previously selected from the autocomplete when typing a similar query. The query is slightly different as it uses another table to look up previous query->page mappings, but it goes through the same filtering process.

The second special query is to show smart keyword searches, where the first token in the query is the keyword and other tokens are additional search parameters. We look in the bookmarks database to find a matching keyword and insert the search parameters into the "%s" portion of the bookmark url.

Update search status

After a chunk is finished, we report to the listener that we've either found results or not and if we're going to continue searching.

This is done by setting the result status on the nsIAutoCompleteResult package and delivering the results to the nsIAutoCompleteListener.

Continue searching if necessary

If we have enough search results, we stop searching; otherwise, we keep searching until we run out of pages to process. The continuing search is set up with a timer that waits a little bit before continuing again.

Additionally, if the search behavior is to match on word boundaries first, it's possible for the search to start back at the beginning (from the first chunk) and process all the pages again without forcing a match on word boundaries. This is to ensure we show word boundary matches first even if a match-anywhere page has a higher frecency.

Optimizations

There are a number of optimizations to help speed up finding matching pages. They tend to focus on the fact that the user is incrementally typing a query, i.e., an additional letter is added to the end of the previous query.

  1. Stop searching when there's enough results
  2. Reuse existing results when filtering more
  3. Don't search if it won't find anything

Stop searching when there's enough results

Instead of finding many pages that match a search query, we stop early when we have at least as many results that will be shown. This allows for a more responsive UI as we finish processing a chunk before notifying the listener that we have results.

Reuse existing results when filtering more

Every time we find matching pages, we keep track of those URLs so that the next query can start searching from those matched pages. This is useful when the user types an additional character at the end of the query.

Additionally, we track which chunk we stopped at for the previous query and continue from that chunk if we start from the previously matched URLs. This is possible because an additional character in the query will not match anything a previous query didn't match.

Don't search if it won't find anything

If there were no previously matching URLs and the search has completely finished searching through all places, we can immediately say that there will be no results when the user types more. This is useful for users typing in a new URL either editing an existing URL or typing a brand new one.