Firefox/Projects/App-wide Database Vacuum

From MozillaWiki
Jump to: navigation, search

Overview

Sprint lead: YOURNAMEHERE
Sprinters:

Description
Periodically vacuum SQLite databases created by Firefox.

Goals / Use Cases

  • Definitively determine if there's a performance benefit to vacuuming.
  • Vacuum in an unobtrusive manner that does not interrupt the user.
  • OPTIONAL: Expose a chrome-space function for triggering vacuum that can be called from the console.

Non Goals

  • Add a UI component for manually triggering vacuum.

Design

  • Perfect solution would be to ask Sqlite team to implement defragmentation in incremental vacuuming. Actually incremental vacuum could create more fragmentation and make things perform even worst. That's why it's not used actually.
  • Need to work on our specific solution while this is is not implemented upstream. Solving the issue globally would be better than creating a browser only component, but will be even harder because will have to take in count all possible setups of various kind of dbs (exclusive, shared, ...).
  • Vacuum only if really needed. First of all doing vacuum too often could be bad for inserts since they usually reuse freelists pages, and that is faster than creating a new page. Doing it every 6 months is probably enough, but we can also get a guess if it should be done or not. We can compare PRAGMA page_count with PRAGMA freelist_count, if there is a large enough number of pages in the freelist is most likely that vacuum will help. This is a large guess, but could work to avoid some useless work, just need to find a good percentage, and maybe still force a vacuum even if is not needed after 2 or 3 tries. A better guess can be only obtained by directly accessing the db file and counting used bytes page by page, this is something done by sqlite_analyzer, but is a slow process.
  • different databases have different way to interact. If the db is exclusive we won't be able to run anything on it without asking for the connection to the service... so simply enumerating all dbs in the profile and running the same code on all of them could not work.
  • investigate if temporary pragma settings (like putting journal in memory) could speed up vacuuming
  • Need to find a good time to do the work, the main point is that it should in no way block users' activities. Possible solutions are: on major update, on idle, in background.

Vacuum on major update

Advantages:

  • is executed in a moment where the user is "prepared" to expect some time for the new version
  • is transparent to the user, no need to ask him anything
  • Can maybe be done at a stage where databases are still closed, this means all dbs can be handled the same way and switched on the fly

Disadvantages:

  • will slowdown update
  • we can't guess update strategy of other apps, would be probably browser only

Vacuum on idle

Every month (or 2 months), after a relatively small idle time (to avoid hitting standby or hibernation) like 5 minutes we could start analyzing dbs using our fragmentation guess. If there are dbs that needs vacuuming show to the user a window saying something like "App is going to perform maintenance of its databases" with a 10 seconds countdown and a cancel button. When countdown expires a progressbar starts. the user can still cancel the process (we cannot interrupt a single vacuum, but we could stop vacuuming next dbs in the list) with the cancel button.

Advantages:

  • won't hit a specific moment in which the user is doing something
  • won't slowdown any startup path

Disadvantages:

  • could hurt other tasks (maybe user is watching a video and we will cause skipping)
  • could stop a laptop from going standby (had similar issues with frecency updating)
  • need to provide to the user a way to cancel the process, so he can go back to his work if in a hurry
  • need to special handle dbs based on connection type (if there's an exclusive connection we need to get the connection from the service that created it)

Vacuum in background

Advantages:

  • happens transparently to the user
  • does not lock at specific times

Disadvantages:

  • could slowdown navigation/ui while running (not locking though)
  • need to define a sync strategy, while the db copy is vacuumed the original will still receive updates, that should be copied to the new db
  • need to define a switch stragegy, once the vacuum and sync are complete need to switch the db, closing the old connection could be hard though, maybe replacing the connection on the fly would be easier, but each service should allow to do that.

Bugs