Auto-tools/Projects/BugHunter

From MozillaWiki
Jump to navigation Jump to search

Team

Bob Clary (bc) - Responsibilities include schema design and implementation, system architecture, data generation, database/webserver administration... all things data'ish.

Jonathan Eads (jeads) - Responsibilities include webservice/UI design and implementation... all things web'ish.

Mark Cote (mcote) - Responsibilities include admin webservice/UI... all things admin web'ish.

Overview

The purpose of Bughunter is to help detect bugs in mozilla software products and get them fixed. Bughunter includes a data collection/storage system for managing meta data associated with firefox site/unit test data and a comprehensive UI for analyzing that data.

Bughunter data is separated into two top level categories: site data which includes testing firefox on specific URL's that generate crash reports and unit test data. There are three types of meta data in these two categories: crashes, assertions, and valgrinds. These data types are generated across a set of virtual machines that emulate a variety of different operating systems (MacOSX, Linux, Windows) and build/machine architectures (32/64 bit) in an effort to characterize a given bug's platform specific behavior.

URLs associated with site crash data found in the Socorro database are pulled into bughunter in an effort to reproduce and further characterize crash reports by collecting additional metadata on different platforms.

All of this data is exposed in a user interface that allows users to connect different types of related meta data in "data views" that can signal each other.

All bughunter source can be found at http://hg.mozilla.org/automation/sisyphus

Design and Approach

One of the goals of the system design of the bughunter webservice and UI is to get as close to a configuration file driven system as possible. The system will likely be extended with different data types for different products, the UI was designed with this in mind, the impact of new data on the architecture will be minimal. The core concepts in the UI should stay the same. The UI can represent data in tabular or graphical form, a set of controls are provided for filtering data, and connecting one data display to another. Any data type can be connected to any other data type if they have a field in common. This is referred to as signaling.

Data View

The fundamental unit of data display in the bughunter UI is referred to as a "data view". The default representation of a data view is tabular but can also be graphical. An example data view is depicted below:

DataViewPanel.png


Data View Navigation

A hierarchical menu is available on each data view. This menu can be used to jump to any data type available. Menu items can either be a single data view or a collection of data views. When a collection is selected the page is cleared of all views and a set of data views, that operate together, are loaded.

DataViewNav.png

Architecture

Webservice

The bughunter webservice serves data in JSON. It excepts a set of named parameters provided in an HTTP POST that correspond to different data views. An HTTP POST was used instead of a GET due to the potential large size of crash signatures and crash URL's.

The complete source for the webservice can be found in python/sisyphus/webapp/bughunter/views.py. This file contains two webservices: the admin service and the data view webservice. The admin service manages reporting status for the VM cluster and the data view webservice provides a dataservice and UI for the sisyphus database.

In the data view webservice each data view has a corresponding JSON data structure that describes its properties. This structure can be found in python/sisyphus/webapp/templates/data/views.json. Each view has an attribute called "signals" that lists the POST parameters that are accepted by that data view. See the JSON config file section for details.

All associated source can be found at sisyphus/webapp.

Data Sources

A python module called datasource.py was used for all SQL/Database interactions (https://github.com/jeads/datasource). Datasource provides an interface to MySQL that allows SQL to be stored in a JSON file with an associated name and host_type (master, read_only, etc...). In order to send signals between data views portions of SQL had to be generated dynamically, this is managed by the datasource module to keep SQL munging out of the webservice and to provide a single location where all static SQL can be found (python/sisyphus/webapp/procs/bughunter.json). This allows SQL statements to be treated as "stored procedures", all statements are assigned a name and are suitable for re-use by other scripts.

JSON Config Files

There are two json files used for website/webservice configuration. The file views.json (python/sisyphus/webapp/templates/data/views.json) allows for a hierarchical structure and contains an associative array for each data view presented in the UI. The file bughunter.json (python/sisyphus/webapp/procs/bughunter.json) contains a JSON structure, described in detail in https://github.com/jeads/datasource, that contains all of the static SQL used in the webservice.

views.json Description
/********************
 *
 * VIEW ATTRIBUTE DEFINITIONS
 *
 * name - The name of the data view.  This is used in the webservice in a 
 *        datastructure called VIEW_ADAPTERS that maps the view name to a 
 *        data adapter function responsible for munging data from SQL queries
 *        into a structure suitable for the UI.  The name is also found in 
 *        bughunter.json (python/sisyphus/webapp/procs/bughunter.json) which
 *        contains the SQL that corresponds to this view.
 *
 * read_name - The readable name displayed for the view
 *
 * signals - An associative array containing all of the signals that this view
 *           can send/receive.  The signal names correspond to database column
 *           names that are dynamically built into the SQL depending on what 
 *           the user action is.
 *
 * control_panel - The html file name of the control panel associated with this view.
 *
 * default - When set, it tells the UI to use this view as the default view to display
 *
 * data_adapter - The name of the javascript data adapter used in the UI to manage any
 *                idiosyncratic behavior unique to this view.
 *
 * charts - An associative array of the visualization types this view can be rendered in
 ***********************/
[
  "Site Tests", 
  [  "Crashes", 
     [ { "name":"crashes_st",
         "read_name":"Site Crashes",
         "signals":{ "signature":"1", 
                     "fatal_message":"1",
                     "address":"1", 
                     "pluginfilename":"1", 
                     "pluginversion":"1", 
                     "exploitability":"1" },
         "control_panel":"crashes.html",
         "default":1,
         "data_adapter":"crashes",
         "charts":[ { "name":"table", "read_name":"Table" },
                    { "name":"platform_tree", "read_name":"Platform Tree" } ]
        },
        { "name":"crash_urls_st",
          "read_name":"Site Crash URL Summary",
          "signals":{ "url":"1", 
                      "signature":"1", 
                      "fatal_message":"1",
                      "address":"1", 
                      "pluginfilename":"1", 
                      "pluginversion":"1", 
                      "exploitability":"1" },
          "control_panel":"named_fields.html",
          "data_adapter":"urls",
          "charts":[ { "name":"table", "read_name":"Table" } ]
         } ...etc

In addition to the data views, this JSON structure can also define a collection of views that will be automatically opened and connected when selected from the navigation menu. A view collection structure looks like this:

/*********************
 * VIEW COLLECTION ATTRIBUTE DEFINITIONS
 *
 * name - The name of the view.
 *
 * default - If set, the collection will be used as the default dataset
 *           to initialize the UI to.
 *
 * read_name - The readable name displayed in the UI.
 *
 * collection - An array of associative arrays.  Each nested associative array 
 *              defines a view to include in this collection and the parent/child
 *              relationships of all views in the collection.
 *
 *    bhview - The view name, must be contained somewhere in this file.
 *             
 *    parent - The view name of the parent.  If the view is the parent of 
 *             all other views in the collection, its parent should be set
 *             to an empty string.
 **********************/
"Collections",
     [ { "name":"crash_explorer_st",
         "default":1,
         "read_name":"Crash Explorer",
         "collection":[ { "bhview":"crashes_st", "parent":"" },
                        { "bhview":"crash_detail_st", "parent":"crashes_st" },
                        { "bhview":"crash_urls_st", "parent":"crashes_st" } ]
       },
       { "name":"assertion_explorer_st",
         "read_name":"Assertion Explorer",
         "collection":[ { "bhview":"assertions_st", "parent":"" },
                        { "bhview":"assertion_detail_st", "parent":"assertions_st" },
                        { "bhview":"assertion_urls_st", "parent":"assertions_st" } ]
       }
     ] ..etc
bughunter.json Description

For more details see https://github.com/jeads/datasource

{
"views":{
   "crashes_st":{                         
           "sql":"SELECT c.signature,
                         str.fatal_message,
                         str.branch,
                         str.os_name,
                         str.os_version,
                         str.cpu_name,
                         str.build_cpu_name,
                         COUNT( c.id ) AS 'total_count'
                   FROM Crash AS c
                   JOIN SiteTestCrash AS stc ON c.id = stc.crash_id
                   JOIN SiteTestRun AS str ON stc.testrun_id = str.id
                   WHERE (stc.datetime >= 'REP0' AND stc.datetime <= 'REP1') REP2
                   GROUP BY c.signature,
                            str.fatal_message,
                            str.branch,
                            str.os_name,
                            str.os_version,
                            str.cpu_name
                   ORDER BY str.fatal_message DESC, c.signature ASC
                   LIMIT 1000",
           "host_type":"master_host"
     },
     "new_crash_signatures_st":{
     
           "sql":"SELECT c.signature,
                         str.fatal_message,
                         str.branch,
                         str.os_name,
                         str.os_version,
                         str.cpu_name,
                         str.build_cpu_name,
                         COUNT( c.id ) AS 'total_count'
                   FROM Crash AS c
                   JOIN SiteTestCrash AS stc ON c.id = stc.crash_id
                   JOIN SiteTestRun AS str ON stc.testrun_id = str.id
                   WHERE (stc.datetime >= 'REP0' AND stc.datetime <= 'REP1') AND 
                          c.id NOT IN (
                             SELECT c.id
                             FROM Crash AS c
                             JOIN SiteTestCrash AS stc ON c.id = stc.crash_id
                             WHERE (stc.datetime < 'REP0')) REP2
                   GROUP BY c.signature,
                            str.fatal_message,
                            str.branch,
                            str.os_name,
                            str.os_version,
                            str.cpu_name
                   ORDER BY str.fatal_message DESC, c.signature ASC
                   LIMIT 1000",
          
           "host_type":"master_host"
     }...etc

Navigation Menu Generation

The view navigation menu in the UI is generated with the following specialized django manage.py command:

python manage.py build_nav

This command outputs two files.

  1. python/sisyphus/webapp/html/nav/nav_menu.html - This file contains an HTML unordered list tag that's generated from the structure in views.json. It is used for the navigation menu for each data view.
  2. python/sisyphus/webapp/templates/bughunter.navlookup.html - This file contains a single hidden input field containing a JSON associative array with each data view structure from views.json within it. It's used in the javascript as a lookup table when view configuration is required.

User Interface

Class Structure

Database

The sisyphus schema can be found here.

Implementation

Technical notes, plans, and designs detailing how the project will be realized. The specifics of "how".