Auto-tools/Projects/BugHunter

From MozillaWiki
Jump to: navigation, search

Team

Bob Clary (bc) - Responsibilities include all things data'ish.

Jonathan Eads (jeads) - Responsibilities include all things web'ish.

Mark Cote (mcote) - Responsibilities include all things admin web'ish.

Overview

The purpose of Bughunter is to help detect bugs in mozilla software products and get them fixed. Bughunter includes a data collection/storage system for managing meta data associated with firefox site/unit test data and a comprehensive UI for analyzing that data.

Bughunter data is separated into two top level categories: site data which includes testing firefox on specific URL's that generate crash reports and unit test data. There are three types of meta data in these two categories: crashes, assertions, and valgrinds. These data types are generated across a set of virtual machines that emulate a variety of different operating systems (MacOSX, Linux, Windows) and build/machine architectures (32/64 bit) in an effort to characterize a given bug's platform specific behavior.

URLs associated with site crash data found in the Socorro database are pulled into bughunter in an effort to reproduce and further characterize crash reports by collecting additional metadata on different platforms.

All of this data is exposed in a user interface that allows users to connect different types of related meta data in "data views" that can signal each other.

All bughunter source can be found at http://hg.mozilla.org/automation/sisyphus

Design and Approach

One of the goals of the bughunter webservice and UI system design, is to represent data generically enough to enable the addition of new data types by modifying JSON configuration files. The system will likely be extended with different data types for different products, the UI was designed with this in mind, the impact of new data on the architecture and source code will be minimal. The core concepts in the UI and architecture should stay the same. The UI can represent data in tabular or graphical form, a set of controls are provided for filtering data, and connecting one data display to another. Any data type can be connected to any other data type if they have a field in common. This is referred to as signaling.

Data View

The fundamental unit of data display in the bughunter UI is referred to as a "data view". The default representation of a data view is tabular but can also be graphical. An example data view is depicted below:

DataViewPanel.png

The tabular data shown above is represented in a graphical form called a "Platform Tree" displayed below. A data view can have any number of tabular or graphical representations.

DataViewVis.png

Data View Controls

A set of controls/filters are available for each data view. Controls available on the control panel, depicted below, apply filters to the query associated with this view. The date range is an exception to this rule, the date range set in the control panel is applied to the data view it is attached to and to any views that receive signals from that view. Every data view is constrained to a particular date range. The date range of a parent view is always sent along with any signal. This makes it easy to examine a date range with a collection of connected data views.

DataViewControls.png

Data View Navigation

A hierarchical menu is available on each data view. This menu can be used to jump to any data type available. Menu items can either be a single data view or a collection of data views. When a collection is selected the page is cleared of all views and a set of data views, that operate together, are loaded.

DataViewNav.png

Data View Signaling

Signals can be sent from one data view to another when they have a field in common. They are sent by clicking on a link in a tabular data representation or clicking on a selectable region of a graphical representation. A data view can have any number of child data views that it sends signals to. In the case of data coming from a relational database, where there is a SQL query that corresponds to the data view, the signal is typically rendered as some type of constraint in a WHERE clause in SQL. The concept is not limited to RDBS databases but can be executed in the application when querying another webservice or NoSQL database. This signaling allows a user to drill down on a complex set of data. Data views can be displayed as multiple panes in a single browser window or spread out across multiple browser windows to make use of whatever screen capacity is available.

A collection of connected data views is displayed below. The first view in the collection, "Site Crashes" can send signals to the two child views "Site Related Crashes" and "Site Crash URL Summary". The signal sent/received is displayed in the middle control panel above the table display. This allows a user to analyze relationships between these three different datatypes within a single browser window.

DataViewSignals.png

Architecture

Webservice

The bughunter webservice serves data in JSON. It excepts a set of named parameters provided in an HTTP POST that correspond to different data views. An HTTP POST was used instead of a GET due to the potential large size of crash signatures and crash URL's that need to be passed as parameters asynchronously and on page load depending on the user action.

The complete source for the webservice can be found in python/sisyphus/webapp/bughunter/views.py. This file contains two webservices: the admin service and the data view webservice. The admin service manages reporting status for the VM cluster and the data view webservice provides a dataservice and UI for the sisyphus database.

In the data view webservice each data view has a corresponding JSON data structure that describes its properties. This structure can be found in python/sisyphus/webapp/templates/data/views.json. Each view has an attribute called "signals" that lists the POST parameters that are accepted by that data view. See the JSON config file section for details.

All associated source can be found at sisyphus/webapp.

Data Sources

A python module called datasource was used for all SQL/Database interactions (https://github.com/jeads/datasource). Datasource provides an interface to MySQL that allows SQL to be stored in a JSON file with an associated name and host_type (master, read_only, etc...). In order to send signals between data views portions of SQL had to be generated dynamically, this is managed by the datasource module to keep SQL munging out of the webservice and to provide a single location where all static SQL can be found (python/sisyphus/webapp/procs/bughunter.json). This allows SQL statements to be treated as "stored procedures", all statements are assigned a name and are suitable for re-use by other scripts.

JSON Config Files

There are two json files used for website/webservice configuration. The file views.json (python/sisyphus/webapp/templates/data/views.json) allows for a hierarchical structure and contains an associative array for each data view presented in the UI. The file bughunter.json (python/sisyphus/webapp/procs/bughunter.json) contains a JSON structure, described in detail in https://github.com/jeads/datasource, that contains all of the static SQL used in the webservice.

views.json Description
/********************
 *
 * VIEW ATTRIBUTE DEFINITIONS
 *
 * name - The name of the data view.  This is used in the webservice in a 
 *        datastructure called VIEW_ADAPTERS that maps the view name to a 
 *        data adapter function responsible for munging data from SQL queries
 *        into a structure suitable for the UI.  The name is also found in 
 *        bughunter.json (python/sisyphus/webapp/procs/bughunter.json) which
 *        contains the SQL that corresponds to this view.
 *
 * read_name - The readable name displayed for the view
 *
 * signals - An associative array containing all of the signals that this view
 *           can send/receive.  The signal names correspond to database column
 *           names that are dynamically built into the SQL depending on what 
 *           the user action is.
 *
 * control_panel - The html file name of the control panel associated with this view.
 *
 * default - When set, it tells the UI to use this view as the default view to display
 *
 * data_adapter - The name of the javascript data adapter used in the UI to manage any
 *                idiosyncratic behavior unique to this view.
 *
 * charts - An associative array of the visualization types this view can be rendered in
 ***********************/
[
  "Site Tests", 
  [  "Crashes", 
     [ { "name":"crashes_st",
         "read_name":"Site Crashes",
         "signals":{ "signature":"1", 
                     "fatal_message":"1",
                     "address":"1", 
                     "pluginfilename":"1", 
                     "pluginversion":"1", 
                     "exploitability":"1" },
         "control_panel":"crashes.html",
         "default":1,
         "data_adapter":"crashes",
         "charts":[ { "name":"table", "read_name":"Table" },
                    { "name":"platform_tree", "read_name":"Platform Tree" } ]
        },
        { "name":"crash_urls_st",
          "read_name":"Site Crash URL Summary",
          "signals":{ "url":"1", 
                      "signature":"1", 
                      "fatal_message":"1",
                      "address":"1", 
                      "pluginfilename":"1", 
                      "pluginversion":"1", 
                      "exploitability":"1" },
          "control_panel":"named_fields.html",
          "data_adapter":"urls",
          "charts":[ { "name":"table", "read_name":"Table" } ]
         } ...etc

In addition to the data views, this JSON structure can also define a collection of views that will be automatically opened and connected when selected from the navigation menu. A view collection structure looks like this:

/*********************
 * VIEW COLLECTION ATTRIBUTE DEFINITIONS
 *
 * name - The name of the view.
 *
 * default - If set, the collection will be used as the default dataset
 *           to initialize the UI to.
 *
 * read_name - The readable name displayed in the UI.
 *
 * collection - An array of associative arrays.  Each nested associative array 
 *              defines a view to include in this collection and the parent/child
 *              relationships of all views in the collection.
 *
 *    bhview - The view name, must be contained somewhere in this file.
 *             
 *    parent - The view name of the parent.  If the view is the parent of 
 *             all other views in the collection, its parent should be set
 *             to an empty string.
 **********************/
"Collections",
     [ { "name":"crash_explorer_st",
         "default":1,
         "read_name":"Crash Explorer",
         "collection":[ { "bhview":"crashes_st", "parent":"" },
                        { "bhview":"crash_detail_st", "parent":"crashes_st" },
                        { "bhview":"crash_urls_st", "parent":"crashes_st" } ]
       },
       { "name":"assertion_explorer_st",
         "read_name":"Assertion Explorer",
         "collection":[ { "bhview":"assertions_st", "parent":"" },
                        { "bhview":"assertion_detail_st", "parent":"assertions_st" },
                        { "bhview":"assertion_urls_st", "parent":"assertions_st" } ]
       }
     ] ..etc
bughunter.json Description

For more details see https://github.com/jeads/datasource

{
"views":{
   "crashes_st":{                         
           "sql":"SELECT c.signature,
                         str.fatal_message,
                         str.branch,
                         str.os_name,
                         str.os_version,
                         str.cpu_name,
                         str.build_cpu_name,
                         COUNT( c.id ) AS 'total_count'
                   FROM Crash AS c
                   JOIN SiteTestCrash AS stc ON c.id = stc.crash_id
                   JOIN SiteTestRun AS str ON stc.testrun_id = str.id
                   WHERE (stc.datetime >= 'REP0' AND stc.datetime <= 'REP1') REP2
                   GROUP BY c.signature,
                            str.fatal_message,
                            str.branch,
                            str.os_name,
                            str.os_version,
                            str.cpu_name
                   ORDER BY str.fatal_message DESC, c.signature ASC
                   LIMIT 1000",
           "host_type":"master_host"
     },
     "new_crash_signatures_st":{
     
           "sql":"SELECT c.signature,
                         str.fatal_message,
                         str.branch,
                         str.os_name,
                         str.os_version,
                         str.cpu_name,
                         str.build_cpu_name,
                         COUNT( c.id ) AS 'total_count'
                   FROM Crash AS c
                   JOIN SiteTestCrash AS stc ON c.id = stc.crash_id
                   JOIN SiteTestRun AS str ON stc.testrun_id = str.id
                   WHERE (stc.datetime >= 'REP0' AND stc.datetime <= 'REP1') AND 
                          c.id NOT IN (
                             SELECT c.id
                             FROM Crash AS c
                             JOIN SiteTestCrash AS stc ON c.id = stc.crash_id
                             WHERE (stc.datetime < 'REP0')) REP2
                   GROUP BY c.signature,
                            str.fatal_message,
                            str.branch,
                            str.os_name,
                            str.os_version,
                            str.cpu_name
                   ORDER BY str.fatal_message DESC, c.signature ASC
                   LIMIT 1000",
          
           "host_type":"master_host"
     }...etc

Navigation Menu Generation

The view navigation menu in the UI is generated with the following specialized django manage.py command:

python manage.py build_nav

This command outputs two files.

  1. python/sisyphus/webapp/html/nav/nav_menu.html - This file contains an HTML unordered list tag that's generated from the structure in views.json. It is used for the navigation menu for each data view.
  2. python/sisyphus/webapp/templates/bughunter.navlookup.html - This file contains a single hidden input field containing a JSON associative array with each data view structure from views.json within it. It's used in the javascript as a lookup table when view configuration is required.

User Interface

The javascript that implements the user interface is constructed using a page/component/collection pattern thingy... whatever that means. This was found very useful in separating out the required functionality, below is a brief definition of what that means in bughunter.

Class Definitions

Page: Manages the DOM ready event, implements any top level initialization that's required for the page. An instance of the page class is the only global variable that other components can access, if they're playing nice. The page class instance is responsible for instantiating components and storing them in attributes. The page class also holds any data structures that need to be globally accessible to component classes.

Component: Contains the public interface of the component. A component can encapsulate any functional subset/unit provided in a page. The component will typically have an instance of a View and Model class. The component class is also responsible for any required event binding.

View: A component's view class manages interfacing with the DOM. Any CSS class names or HTML id's are defined as attributes of the view. Any HTML element modification is controlled with this class.

Model: A component's model manages any asynchronous data retrieval and large data structure manipulation.

Collection: A class for managing a collection of Components or classes of any type. A collection can also have a model/view if appropriate.

Class Structure

This is not a complete file or class listing but is intended to give a top level description of the design pattern thingy of the bughunter javascript and what the basic functional responsibility of the pages/components/collections are. See the README for more details.

BughunterPage.js 
   BughunterPage Class - Manages the DOM ready event, component initialization, and
                         retrieval of the views.json structure that is used by different
                         components.

Bases.js
   Design Pattern Base Classes - Contains the base classes for Page, Component, Model, View etc...


BHViewComponent.js 
   BHViewComponent Class - Encapsulates the behavior of a single data view using a model/view and  
                           provides a public interface for data view functionality.  Manages
                           event binding and registration.
   BHViewView Class - Encapsulates all DOM interaction required by a data view.
   BHViewModel Class - Encapsulates asynchronous server communication and data structure
                       manipulation/retrieval.


BHViewCollection.js 
   BHViewCollection Class - Manages operations on a collection of data views using a model/view
                            including instantiating view collections.  
                         
   BHViewCollectionView Class - Encapsulates all DOM interaction required by the collection.
   BHViewCollectionModel Class - Provides an interface to the datastructures holding all data
                                 views and their associated parent/child relationships.
DataAdapterCollection.js
   DataAdapterCollection Class - Collection of BHViewAdapter class instances. 
   BHViewAdapter Class - Base class for all BHViewAdapters.  Manages shared view
                         idiosyncratic behavior like what fields go in the 
                         control panel and how to populate/retrieve them for 
                         signaling behavior.
   CrashesAdapter Class - Derived class of BHViewAdapter.  Encapsulates unique 
                          behavior for crash data views.
   UrlAdapter Class - Derived class of BHViewAdapter. Encapsulates unique behavior 
                      for views containing URL summaries.
ConnectionsComponent.js 
   ConnectionsComponent Class - Provides a public interface for opening new views via events.
   ConnectionsView Class - Encapsulates all DOM interactions required by the 
                           Open New View modal window.
VisualizationCollection.js
   VisualizationCollection Class - Holds a collection of classes that can 
                                   represent data views graphically.
   Visualization Class - Base class for managing shared functionality between 
                         data view graphics rendering classes. 
   PlatformTree Class - Derived class of Visualization.  Renders tabular 
                        data for Crashes, Assertions, and Valgrinds as a 
                        circular tree.

Database

The sisyphus schema can be found at Media:Sisyphus_schema.pdf. This pdf is not up to date but is useful in getting an idea of what data is available.

Implementation

Client

The following list of javascript packages were used as core infrastructure pieces in the bughunter client architecture.

  • jQuery - For DOM interactions
  • moo4q - For OOP in jQuery. All bughunter classes are built using this strategy.
  • datatables.js- This jquery plugin was used for all tabular display of data. It's pretty awesome.
  • underscore.js - This javascript module was used for some algorithms/datastructures and maintaining function context in event binding... among other things.
  • jit - This data visualization javascript module was used for the Platform Tree representation. It absolutely rocks for representing hierarchical/graph type data.
  • UI specification, This was the original functional spec that was developed at the beginning of this project. It's mildly entertaining to see how it deviates from the final product.

Webservice

  • nginx - Used as the web server.
  • fastcgi - Used for running django.
  • django - Used as the web framework.
  • datasource - Used for encapsulation and dynamic generation of SQL with MySQL.

Database

  • MySQL