Changes

Jump to: navigation, search

Performance/Avoid SQLite In Your Next Firefox Feature

9,436 bytes added, 08:48, 20 March 2014
First draft
== Introduction ==
Many Firefox developers see SQLite as a default choice for storing any non-trivial amount of data, this wiki explains why that view is incorrect.

SQLite may seem like a very appealing option when first designing a feature. It offers the familiar and powerful SQL language, our codebase has nice C++ and JavaScript APIs for it, and it's already used in many places in the codebase. However, when choosing storage formats, we have to keep in mind that SQLite is a powerful and reliable database and that it comes with a lot of '''hidden complexity'''. It's very tempting to take the abstractions it offers at face value and to ignore the complexity happening under the hood. As a result, we have repeatedly seen SQLite become a source of performance problems. This isn't an indictment of SQLite itself -- any other relational embedded DB would pose the same challenges.

SQLite DBs are simply too complex to be used for relatively simple data storage needs. The list below outlines some of the ways SQLite can backfire on developers.

== SQLite Pitfalls ==

=== Storage Footprint ===

* Features that regularly add data to the DB (e.g. visited URIs), but have no expiration policy for the data, can easily cause the DB to balloon into tens or hundreds of MBs. This is particularly undesirable on mobile devices
* WAL journals have been known to grow very large, e.g. {{bug|609122}}, {{bug|608422}}
* Every index contains a complete copy of the indexed data. Creating indexes on lengthy text fields will cause very large indexes to be stored on disk
* Using the default microsecond precision for timestamps causes unnecessary bloat

=== I/O Patterns ===

* Your database file can suffer from logical/internal fragmentation due to I/O patterns
** e.g. if there is a pattern of interleaved writes to indexes and table data, parts of each will end up scattered all over the database file, and queries will not be able to do sequential I/O
* Physical/external fragmentation can also happen due to SQLite using inappropriate defaults
** e.g. the factory-default SQLite [http://www.sqlite.org/pragma.html#pragma_page_size page size] is 1024 bytes. When the default page size was changed to 32KB in {{bug|416330}}, there was a [http://taras.glek.net/blog/2013/06/28/new-performance-people/ 4x reduction in SQLite IO waits] according to Telemetry
* Performance can [http://sqlite.1065341.n5.nabble.com/Coping-with-database-growth-fragmentation-td44781.html#a44782 significantly degenerate over time]; scheduling periodic [https://sqlite.org/lang_vacuum.html rebuilds of the DB] is necessary.
** Note that this "vacuuming" only deals with internal fragmentation, the filesystem has its own fragmentation
* SQLite uses fsync's to guarantee transaction durability and enable recovery from crashes. fsyncs can be very expensive, and happen relatively frequently in the default rollback-journal mode. This performance/reliability trade-off might not be necessary for simple storage needs.
* JSON files or log files will show better I/O patterns almost every time, especially if they're compressed and read/written in entirety each time

=== Memory Usage ===

* By default, our SQLite wrapper uses a [http://hg.mozilla.org/mozilla-central/file/444714c3820a/storage/src/mozStorageConnection.cpp#l48 max of 2MB memory cache per connection].
** This cache size may be too large for Fennec and B2G, especially if there are multiple connections. The cache size should be adjusted with [[#Important Pragmas|PRAGMAs]]

=== CPU Usage ===

* Users have found certain Firefox features using 15% of their CPU time because SQL statements were constantly being executed in the background
* Lock contention can occur when using the DB from multiple threads
** e.g. VACCUM-ing the DB on one thread while executing SQL queries on another

=== Battery Life ===

* Same as CPU, important on mobile

=== Unintended Main-Thread I/O ===

Main-thread SQL is a known evil, and luckily there are only a few major sources of it left in the codebase, but the [http://telemetry.mozilla.org/slowsql/ Slow SQL dashboard] shows that developers are still getting bitten by main thread I/O from unexpected sources:

* PRAGMA statements to set the cache size are done on the main thread, but if this is the first transaction of the session, it can trigger SQLite crash recovery operations. This is particularly bad if using the default WAL journal size. Also note that "crashes" are particularly common on mobile where we never have clean exits.
* Some addons access Firefox DBs directly using main thread SQL

=== Contending for Access to Storage with the Main Thread ===

Bad SQLite performance off the main thread also degrades Firefox responsiveness because it can contend with the main thread for access to storage. Sadly our code still does main-thread I/O, and additionally, there will always be main-thread IO from add-ons and swapping.

* Even seemingly simple SQL queries can take multiple seconds to execute
** e.g. in {{bug|966469}} a user reports the following statement taking 22 seconds to execute: <code>DELETE FROM moz_pages WHERE id NOT IN (SELECT DISTINCT(pid) FROM moz_subresources);</code>

=== UX Impact of Slow Queries ===

Slow DB operations hurt UX. For example, you might have noticed that the AwesomeBar sometimes takes a very long time to fetch any auto-complete suggestions.

* The schemas have to be designed carefully and the DB has to be maintained periodically
* Fragmentation causes extra seeks, and the problem is made worse by magnetic hard disks and cheap or old SSDs. Vacuuming DBs helps, but vacuums are resource-intensive and scheduling them can be tricky. See the [[#More Tips|Tips]] section

== How to Store Your Data ==

* If you need to store a small amount of data (less than 1MB), you should use JSON files, and do all your I/O off the main thread. Simple solutions with linear worst-case performance are ideal.
** If you're working with larger amounts of data (roughly on the order of 1MB), you should compress the JSON data with Snappy before writing it to disk
** You can use OS.File's [https://developer.mozilla.org/en-US/docs/JavaScript_OS.File/OS.File_for_the_main_thread#OS.File.writeAtomic%28%29 writeAtomic()] method. More information on writeAtomic [http://dutherenverseauborddelatable.wordpress.com/2014/02/05/is-my-data-on-the-disk-safety-properties-of-os-file-writeatomic/ in this blog post]
** If your workload involves a lot of strings, don't use SQLite. Store your data in external files.
* For larger datasets or when SQL is absolutely necessary, use SQLite.
** Make sure you understand [https://sqlite.org/docs.html how SQLite works], carefully design your schemas and then profile your implementation.
** IndexedDB is implemented on top of SQLite and has additional issues

NOTE: ''We're currently working on an intermediate solution based on log-storage which will reduce the amount of data written for small modifications to data already on disk.''

== How to Best Use SQLite If You Really Need To ==

=== Important Pragmas ===
See [http://www.sqlite.org/pragma.html the list of SQLite pragmas], the defaults are probably not OK. We already over-ride some defaults in the [http://mxr.mozilla.org/mozilla-central/source/db/sqlite3/src/moz.build moz.build file] and [http://mxr.mozilla.org/mozilla-central/source/storage/src/mozStorageConnection.cpp#622 mozStorageConnection.cpp].

* Use a [https://www.sqlite.org/wal.html write-ahead log] for fewer fsyncs
** <code>PRAGMA journal_mode = WAL;</code>
* Set the DB cache size based on the device specs, and consider adjusting it dynamically based on memory pressure on mobile devices
** <code>PRAGMA cache_size = X;</code>
* For large DBs prone to fragmentation, pre-allocate a large empty DB initially by increasing the DB's growth increment
** This can be done by calling Connection::SetGrowthIncrement() which changes the value of SQLite's <code>SQLITE_FCNTL_CHUNK_SIZE</code>. See {{bug|581606}} for reference.

=== More Tips ===

* It should go without saying that you should '''never execute SQL on the main thread'''
** Use the [https://developer.mozilla.org/en-US/docs/Mozilla/JavaScript_code_modules/Sqlite.jsm SQLite.jsm wrapper] for DB operations from JavaScript, it's fully asynchronous and off-main-thread
** For native code, use the [https://mxr.mozilla.org/mozilla-central/source/storage/public/mozIStorageAsyncConnection.idl async methods]
* Model how large your DB will grow on desktop/mobile. Report the DB size & memory use to [https://developer.mozilla.org/en-US/docs/Performance/Adding_a_new_Telemetry_probe Telemetry] to confirm your estimates were accurate. '''Implement an expiration policy!'''
* Profile your implementation and measure the number of SQL operations performed over the course of a typical session if it’s an “always-on” feature and you suspect it might cause performance issues
* Don’t index on text fields, index on hashes of text fields. Indexes can become very large.
* Don't use the default precision for timestamps (microseconds). Desktop apps are unlikely to ever require this level of precision
* Maintain your DB with [https://sqlite.org/lang_vacuum.html VACUUMs] during Firefox version upgrades or during idle times. Run [http://www.sqlite.org/lang_analyze.html ANALYZE] periodically if necessary
** You should have a vacuum plan from the get-go. '''This is a requirement for all new code'''.
Confirm
356
edits

Navigation menu