Breakpad/Design/Database: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
Line 4: Line 4:


PostgreSQL 8.2 or higher should be used because it offers things that MySQL does not have:
PostgreSQL 8.2 or higher should be used because it offers things that MySQL does not have:
** Partitioning if we need to scale the database in the future, which will likely happen if we plan on keeping more than 3 months worth of data.  For example, we could partition by the week and have the db create a new child table for each new month as time goes on, then pop off outdated partitions once they become too old but keep them in another db for since-beginning-of-time queries.
* Partitioning if we need to scale the database in the future, which will likely happen if we plan on keeping more than 3 months worth of data.  For example, we could partition by the week and have the db create a new child table for each new month as time goes on, then pop off outdated partitions once they become too old but keep them in another db for since-beginning-of-time queries.
** Arguably better performance for larger data sets (read Google if you want to, but most of this lies in partitioning and bitmask joins)
* Arguably better performance for larger data sets (read Google if you want to, but most of this lies in partitioning and bitmask joins)
** More precise datatypes (adhering to SQL standards is a plus)
* More precise datatypes (adhering to SQL standards is a plus)
** Long-standing support for stored procedures
* Long-standing support for stored procedures
** Views for databases that are heavily normalized (helps in simple reporting!)
* Views for databases that are heavily normalized (helps in simple reporting!)


Designing the schema is an important task and shouldn't be taken lightly.  In general, there are typically two routes to take:
Designing the schema is an important task and shouldn't be taken lightly.  In general, there are typically two routes to take:

Revision as of 11:00, 30 March 2007

« back to Breakpad design page

Breakpad Database

PostgreSQL 8.2 or higher should be used because it offers things that MySQL does not have:

  • Partitioning if we need to scale the database in the future, which will likely happen if we plan on keeping more than 3 months worth of data. For example, we could partition by the week and have the db create a new child table for each new month as time goes on, then pop off outdated partitions once they become too old but keep them in another db for since-beginning-of-time queries.
  • Arguably better performance for larger data sets (read Google if you want to, but most of this lies in partitioning and bitmask joins)
  • More precise datatypes (adhering to SQL standards is a plus)
  • Long-standing support for stored procedures
  • Views for databases that are heavily normalized (helps in simple reporting!)

Designing the schema is an important task and shouldn't be taken lightly. In general, there are typically two routes to take:

  • Normalize, come up with a snowflake-like schema, abstract up-front but better long-term
  • Leave denormalized, just swallow inefficiencies in the short term and pay the price later on

How we can have both:

  • Normalize properly, generating a properly abstracted set of tables with partitions
  • Use views to pull together common queries used for reporting

The current schema has:

  • The schema now accounts for the following kinds of information
    • individual minidump records
    • product/OS/buildid information
    • control information used to "register" an incoming minidump and monitor/track/control its progress though all the stages from being transfered from the client until it is fully collected and processed on the server and available for reporting and analysis.

A major concern about over-normalization is data insertion and the problems it presents for the processor, which should probably be transactional and may have to do lookups and some db magic in order to successfully insert new records.

  • using the same database and schema for the minidump data and also for control over the processing of the minidumps has created problems in the past. when coruption is introduced, database performance problems crop up, or maintenance is required to manage the collection of minidumps, the processing of incoming minidumps is hindered and we start into a downward spiral of compounding problems. We should do some thinking to see if there are ways to improve on the existing design.