Firefox/Projects/Places Vacuum

From MozillaWiki
Jump to: navigation, search

Overview

Sprint lead: mak

Sprinters: sdwilsh

Description

Fragmented Sqlite databases are known to make awesomebar performances bad, and cause other perf problems.

Ideally Firefox is moving to use sqlite in more components, so this problem needs to be investigated.

We should first investigate this in Places, it has one of the largest databases in the profile, and issues are immediately visible in the awesomebar that is actually primary UI. Any solution or finding will then be useful to investigate doing this app-wide.

Goals / Use Cases

  • Defragment places.sqlite with vacuum
  • Solve db size issues due to external apps (old Google Toolbar)
  • Fix awesomebar perf issues due to badly fragmented dbs

Non-Goals in this iteration

  • fix sqlite fragmentation issues app-wide
  • over-engineer it
  • make it dependent on user's actions (add UI and manual vacuuming)

Design

Some important point about vacuum

  • vacuum is a slow operation, it takes seconds to vacuum database, depending on its size.
  • vacuum completely regenerates the database, copying page by page from the old database to a new one.
  • incremental vacuum can'tbe used for 2 reasons:
    • it does not actually defragment the database, it just remove unused space
    • it can make fragmentation even worse: "Auto-vacuum does not defragment the database nor repack individual database pages the way that the VACUUM command does. In fact, because it moves pages around within the file, auto-vacuum can actually make fragmentation worse."
  • the database can't be used while a vacuum is running.
  • the time needed to vacuum a database is independent from database status, a just vacuumed database won't take less time to vacuum it again.

Vacuum timings

Vacuuming an 86MBs DB takes about 36s on a laptop with a relatively slow disk. There is some option we can tweak to enhance this:

  • JOURNAL_MODE=TRUNCATE and SYNC=FULL is the default, 36s
  • JOURNAL_MODE=MEMORY and SYNC=FULL is a trick we used migrating 3.0 users to 3.5, pulls down time to 22s
  • JOURNAL_MODE=OFF and SYNC=FULL times 21s, does not really make a difference
  • JOURNAL_MODE=MEMORY and SYNC=OFF times 17s, this is the fastest one, but doesnot have any protection against DB corruption, so we are not going to sacrifice that for 5s.

Final choice is to temporary move the journal file to memory while vacuuming, then resetting it back. Could this be a problem for Mobile? Most likely a mobile DB will contain fair less history (they use smaller prefs) and bookmarks, we expect it to be quite smaller. Eventually JOURNAL_MODE=OFF could become a solution for mobile.

Currently investigated options

  • Vacuum with a button: discarded, we don't want users to have to rememeber they should vacuum.
  • Vacuum in background: Possible for small databases, but for databases like places.sqlite that can take one hundred megabytes, is just not feasible
  • Vacuum on major upgrade: This would be a good compromise, but has some drawback. First of all there is a feeling that making upgrades slower will cause users to delay them. Secondly it would be a Firefox only fix. Third it would put someting storage related in update code.
  • Vacuum on idle: This is the currently taken in count solution.
  • Better vacuum support in SQLite: this is the long term solution

Vacuum on idle

This approach is actually the best one, but has some glitches:

  • We need a guess if the database needs to be vacuumed, we can't detect much about it (see better vacuum section below for information), we will just try to use the freelist to detect if big changes have happened in the database and it needs to be shrinked. Current ratio could be freelist_count/page_count, we need to define which can be a good treshold to activate vacuuming. Actually we could check if 1/5 of the db space is unused space.
  • We should not vacuum too often, indeed vacuum completely removes the freelist, that is used to enhance INSERTs speed. Vacuuming too often will cause INSERTs to be slower. we will try to avoid vacuuming more than once a month, but at least once in 2 months.
  • if the user is idle but watching a movie we could end up creating issues. we will use an async statement to vacuum, that will at least ensure that we run it in a separate thread, if user comes back to the browser the UI won't be locked, but anything Places related won't work and will block it.

Better vacuum support in SQLite

Short term we need at least a way to tell if a database needs to be defragmented, this is actually not possible without manually parsing the database file page by page. That would be over-engineering, slow, and hard to maintain (if the file format changes). Using Places Stats to get a "magic number" able to tell us if the database size is exagerated is not possible. We can get a guess, but in many cases it will generate false positives due to a couple things (i tried generating some spreadsheet to test the thing):

  • external applications can add binary annotations and we can't guess what's the size of those.
  • actually guessing an avg size of titles and urls for different locales can be hard.
  • using an hard size limit is bad because mobile and other apps are going to save a smalleramount of informations.

We will ask SQLite team if it's possible to get a pragma option to analyze the database and get a guess if it needs to be vacuumed.

Long term, the final solution for all the issues would be to get full incremental-vacuuming with defragmentation.

Sdwilsh investigating these needs with SQLite team.

Bugs

Resources