Places/Places SQL queries best practices

From MozillaWiki
Jump to: navigation, search

Places SQL queries best practices

This is a collection of suggestions and hints to help internal developers hack into the Places database correctly and with common code style. Add-on developers are strongly recommended to use the Places API to query the database, since direct queries could end up corrupting the database when schema changes Querying Places.

For a better comprehension please look at the Database schema ERD

Table aliases

When querying Places tables and in need of aliasing a table for joins or complext queries, is always better to use the same aliases in all queries.

This is needed not only for our internal code style, but also to avoid common perf-killer paths in complex DELETEs or UPDATEs, indeed referring to table_name.column_name in such a query could be different than referring to h.id, causing a full lookup of the table. For the above reasons, we always try to avoid the table_name.column_name path.

Commonly used table aliases
Table Alias On-conflict alias
moz_anno_attributes n
moz_annos a
moz_bookmarks b
moz_bookmarks_roots r
moz_favicons f
moz_historyvisits v
moz_historyvisits_temp v v_t
moz_inputhistory i
moz_items_annos a t
moz_keywords k
moz_places h p
moz_places_temp h h_t

When joining a table with itself could make sense define an alias based on the JOIN meaning, for example selecting all bookmarks with a parent called "folder":

SELECT b.id FROM moz_bookmarks b
JOIN moz_bookmarks parents ON parents.id = b.parent
WHERE parents.title = 'folder'

The code style for using aliases is:

  • if the alias is on a real(temp) table from places.sqlite use TABLE_NAME ALIAS
  • if the alias is on a subquery use AS construct, like SUBQUERY AS ALIAS
  • when referring to a column of a table having an alias, always specify the alias, even if there's no risk of column names conficts.

Querying the database in the temp tables era

Starting from Firefox 3.1 moz_places and moz_historyvisits tables are partitioned between disk and memory, to avoid common locking issues with OS's fsyncs calls (bug 442967). New memory tables are called moz_places_temp and moz_historyvisits_temp, with the same columns and indexes as the disk tables.

NOTE: while for disk tables you can query sqlite_master table to get existant tables, indexes and triggers, for temporary tables you should instead use sqlite_temp_master. Notice mozStorage functions indexExists and tableExists are not correct atm (bug 472963), so if you are looking for existance of moz_places_temp, they will return false even if the table exists.

WARNING: don't use any Places service on quit-application or later, since on quit-application we will close the database connection and data could be lost. Use quit-application-granted instead.

The tables are abstracted through a view, actual views are moz_places_view and moz_historyvists_view. Since SQLite as of now (3.6.10) does not support indices on views, we have to follow two different paths based on the type of query we are executing:

  • INSERTs/UPDATEs/DELETEs must always act on the views
    • Real tables have triggers attached to them to allow for syncing so in no case an implementer should try to execute one of these operations on the temp tables.
    • Views have triggers attached to them that ensure data are written/moved to memory tables when needed.
  • SELECT must always act on real tables, since due to the lack of indices querying the view would be slow. We can SELECT from the views in tests though, when we are sure perf is not an issue.

So this is correct:

DELETE FROM moz_places_view
WHERE id = 1337

While this is completely WRONG:

DELETE FROM moz_places_temp
WHERE id = 1337

When using SELECT we must always be sure to query both tables, new changed data will be in the temp table, while the disk table will be synced in background or when a new bookmarks is added (to prevent bookmarks corruption). Still some place/visit won't never be synced to disk, for example visits with a visit_type = TRANSITION_EMBED.

To query both tables we can use a special SQL construct:

  • UNION merges the resultsets of 2 queries discarding duplicates
  • UNION ALL merges the resultsets of 2 queries

We use one of these based on the query, usually UNION is slower than UNION ALL, so the actual way we query both tables for a place with a certain url may be:

SELECT id FROM moz_places_temp
WHERE url = 'http://www.mozilla.org'
UNION ALL
SELECT id FROM moz_places
WHERE url = 'http://www.mozilla.org'
  AND id NOT IN(SELECT id FROM moz_places_temp)

Due to the fact tables are partitioned, we cannot get the last inserted id from one of those, instead we should get the max between both tables, to do that we may, for example, execute:

SELECT MAX(
  (SELECT MAX(id) FROM moz_places_temp),
  (SELECT MAX(id) FROM moz_places)
)

A similar trick can be applied to IFNULL, this, in many complex queries, can help removing an expensive JOIN, to only get a single value:

SELECT IFNULL(
  (SELECT id FROM moz_places_temp WHERE url = 'http://www.mozilla.org'),
  (SELECT id FROM moz_places WHERE url = 'http://www.mozilla.org')
) AS id

Notice SQLite does support only one definition of GROUP BY, ORDER BY, LIMIT for UNIONs, but sometimes we don't need to UNION all results from both queries. In this case we can use a small trick:

SELECT * FROM (
  SELECT id, visit_date FROM moz_historyvisits_temp
  WHERE visit_type = 1
  ORDER BY visit_date DESC
  LIMIT 10
)
UNION ALL
SELECT * FROM (
  SELECT id, visit_date FROM moz_historyvisits
  WHERE visit_type = 1
  ORDER BY visit_date DESC
  LIMIT 10
)
ORDER BY 2 DESC
LIMIT 10

This query gets the 10 most recent visit ids, this involves some code duplication, but is really faster than simply UNION all results when there are a lot of records. Notice however in the external query we can't use column names, instead we have to use the column indices (2: eg visit_date).

Querying multiple tables

When querying multiple tables involving foreign keys, please always avoid cartesian products like:

SELECT b.id, h.url
FROM moz_places h, moz_bookmarks b
WHERE h.id = b.fk

Instead use a JOIN:

SELECT b.id, h.url
FROM moz_places h
JOIN moz_bookmarks b ON h.id = b.fk

That ensures the query is easier to expand, is less error-prone for future developers and more perf-friendly. We actually use 2 types of JOINs:

  • LEFT JOIN is used when the right table values could not exist, but we still want a record containing the left table values.
  • JOIN is used when the right table must obey the JOIN condition, so records without a match in the right table won't be returned.

Querying Places database examples

This section contains an example of how to create a statement to query the Places database. This is for quick reference, further informations are available in the mozStorage developers documentation

Cpp code:

nsCOMPtr<mozIStorageStatement> statement;
rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
    "INSERT INTO moz_bookmarks "
      "(type, parent, position, dateAdded) VALUES (?1, ?2, ?3, ?4)"),
  getter_AddRefs(statement));
NS_ENSURE_SUCCESS(rv, rv);

rv = statement->BindInt64Parameter(0, TYPE_SEPARATOR);
NS_ENSURE_SUCCESS(rv, rv);
rv = statement->BindInt64Parameter(1, aParent);
NS_ENSURE_SUCCESS(rv, rv);
rv = statement->BindInt32Parameter(2, index);
NS_ENSURE_SUCCESS(rv, rv);
rv = statement->BindInt64Parameter(3, PR_Now()); 
NS_ENSURE_SUCCESS(rv, rv);

rv = statement->Execute();
NS_ENSURE_SUCCESS(rv, rv);

JS code:

var stmt = createStatement(
  "INSERT INTO moz_bookmarks " +
  "(id, fk) " +
  "VALUES (NULL, :place_id)");
stmt.params.place_id = 1337.
try {
  stmt.execute();
} catch (e) {
  dump("Statement error: " + e.message);
} finally {
  stmt.finalize();
}

Places is using some precompiled statement to speed up the most used queries creation. Those statements are created at Places init, so it's important to choice which queries need a precompiled statement, since every new addition can hit Ts. Precompiled statements can cause leaks if not correctly destroyed, so when creating a new one, remember to add it to the array in ::FinalizeStatements() method.

When querying with a precompiled statement it is important to use a mozStorageStatementScoper, it will ensure the statement is correctly reset on scope exiting:

{ // init of scope
  mozStorageStatementScoper scoper(mDBGetRedirectDestinations);
  rv = mDBGetRedirectDestinations->BindInt64Parameter(0, aCurrentSource);
  NS_ENSURE_SUCCESS(rv, rv);
  rv = mDBGetRedirectDestinations->BindInt64Parameter(1, aMinTime);
  NS_ENSURE_SUCCESS(rv, rv);

  PRBool hasMore;
  while (NS_SUCCEEDED(mDBGetRedirectDestinations->ExecuteStep(&hasMore)) &&
         hasMore) {
    // do something
  }
} // end of scope
// precompiled statement gets resetted here


in Javascript instead, remember to use statement.reset() when you need to reuse a stetement changing params, and statement.finalize() when you won't need it anymore.

var stmt = createStatement(
  "SELECT b.id FROM moz_bookmarks b " +
  "JOIN moz_places h ON h.id = b.fk " +
  "WHERE h.url = :url");
stmt.params.url = "http://www.mozilla.org/";
while(stmt.step()) {
  dump(stmt.row.id);
}
stmt.reset();
stmt.params.url = "http://www.mozilla.com/";
while(stmt.step()) {
  dump(stmt.row.id);
}
stmt.finalize();

Asynchronous statements

Asynchronous statements are executed by mozStorage in a separate thread, and they are managed as a FIFO queue. Thanks to the separate threads using such a statement is a good idea, since it won't lock the UI. But at the same time we can't rely on immediate database values, since we have to wait for it to be executed (see mozIStorageStatementCallback)

To execute a single async statement in Cpp:

nsCOMPtr<YourStatementCallback> callback =
  new YourStatementCallback();
// this is needed to cancel a pending statement, we don't use it actually
nsCOMPtr<mozIStoragePendingStatement> canceler;
rv = stmt->ExecuteAsync(callback, getter_AddRefs(canceler));
NS_ENSURE_SUCCESS(rv, rv);

while in JS:

var stmt = dbConn.createStatement("UPDATE ...")
dbConn.executeAsync(stmt, mozIStorageStatementCallback_obj);

It is also possible to execute multiple async statements at once using database connection executeAsync:

let statements = [];
statememnts.push(dbConn.createStatement("UPDATE ..."));
statememnts.push(dbConn.createStatement("UPDATE ..."));
dbConn.executeAsync(statements, statements.length, mozIStorageStatementCallback_obj);

Perf hints

  • Use SQLite command EXPLAIN QUERY PLAN statement to check what tables and indices you're using, and try to optimize it.
  • When possible don't rely on SQLite internal optimizer, it can do a very good job, but we can do also better, so always try to optimize your queries.
  • Test queries against a big database, that can make a lot of difference.
  • Use LIMIT every time is possible, especially with partitioned tables, since when working without indices, having small intermediate tables is really important.
  • Use subqueries when you don't really need a complete JOIN but only check for existance of a record, or get at least one record of a certain type.
  • When a query is too slow, try rebuilding it to do the opposite and exclude results (IN, NOT IN constructs can be useful).
  • Virtual machines have slower disk access, so take that in count when testing write performances in a vm.
  • Use transactions every time you need to do multiple writes.
  • Since each query can only use an index, the optimizer has to make a choice when the same query can be executed using two different indices. You can avoid using a certain index, and force the optimizer to use the another one, prepending the column name with a +. For example the following query will use the url index.
    SELECT id, url FROM moz_places WHERE +id = 5 AND url = 'http://www.mozilla.org/'