User:Ashughes/metrics-graphics-gfx

From MozillaWiki
Jump to: navigation, search

Summary

This document explains all of the charts I have prototyped since joining the GFX team at Mozilla. The purpose of this page is to document what has been prototyped with a goal to eventually put into production the metrics which are valuable to people making decisions about the quality of Firefox.

It should be noted that the use case is what's important. The prototypes are just a means to an end and the code itself should not be considered ready for production.

References:

Supplemental

The development of this work gave birth to the Bugzilla Socorro Lens add-on which I developed at the London All-hands 2016 under Benoit Girard's guidance.

Explanation of Charts

Topline Charts

Chart Data Source Explanation
Graph by Signature Socorro Supersearch API Displays total daily crash reports faceted by GPU vendor, GPU driver version, and GPU chipset ID for the signature entered into the textbox or all signatures combined if the textbox is left blank.
Graph by Driver Version Socorro Supersearch API Displays all/graphics/driver/webgl crash count/rate/percent per day for the specified driver version or all drivers combined if the textbox is left blank.
Graph by Device ID Socorro Supersearch API Displays all/graphics/driver/webgl crash count/rate/percent per day for the specified device ID or all device IDs combined if the textbox is left blank.
Topcrash Config Lookup Socorro Supersearch API Displays the daily #1 topcrash as a percentage of all Graphics crashes for the specified vendor/device/driver combination (defaults to all) and presents a table showing the signature for each day.

V2 Charts

Chart Data Source Explanation
Firefox vs Android Crash Rate (Release) Socorro Supersearch API Displays the daily graphics crash count and rate (crashes per install) for Firefox on Desktop and Android.
Firefox vs Android Crash Factor Socorro Supersearch API Displays how many more daily crashes get reported on Release compared to Beta for Firefox on Desktop and Android.
Crashes with WGL+ Socorro Supersearch API Displays daily crashes and percentage of daily crashes annotated with WGL+ (ie. WebGL instantiated) by platform, plus the ability to show/hide a data table
High-volume Crashes Socorro Supersearch API Displays the top-10 and top-5 graphics/driver/webgl signatures as a daily count and percentage of overall crash volume
Graphics Startup Test Socorro Supersearch API Displays daily crashes in the Graphics Startup Test as a count and percentage of overall crash volume
Bug 1295075 Socorro Supersearch API Displays daily crashes related to bug 1295075 (intel driver crash spike fixed in Fx51)

Firefox Charts

Chart Data Source Explanation
Active Daily Installations Socorro Supersearch API Displays the number of active daily installations across each of the channels
Crashes by Channel [NEW] Local JSON Displays the number of Graphics crashes broken down by product channel
Crashes by Platform [NEW] Local JSON Displays the number of Graphics crashes broken down by operating system
Crashes by Vendor [NEW] Hybrid Displays the number of Graphics crashes broken down by GPU vendor
Crashes with graphics-critical-error Local JSON Displays daily crashes annotated with a graphics-critical-error broken down by product channel
Crashes in vendor's graphics driver Local JSON Displays daily crashes with a driver-related signature broken down by GPU vendor
Crashes during the graphics startup test Local JSON Displays daily count of crashes annotated as occurring during the graphics startup test broken down by GPU vendor
MOZ_CRASH crashes by vendor Local JSON Displays daily count of crashes annotated with MOZ_CRASH broken down by GPU vendor

Fennec Charts

Chart Data Source Explanation
Active Daily Installations Socorro Supersearch API Daily number of active installations of Firefox for Android broken down by product channel
Overall crash volume in GFX Local JSON Daily number of crash reports on Android with a graphics-related signature, broken down by product channel
Top-5 devices in overall crash volume Local JSON Daily number of crashes on Android for the top-5 devices by crash volume
Top-5 devices in GFX crash volume Local JSON Daily number of graphics-related crashes on Android for the top-5 devices by crash volume
Top-5 Android versions in overall crash volume Local JSON Daily number of crashes on Android for the top-5 Android versions by crash volume
Top-5 Android versions in GFX crash volume Local JSON Daily number of graphics-related crashes on Android for the top-5 Android versions by crash volume

Bugzilla

Chart Data Source Explanation
Dashboard of all Bugzilla graphs Local JSON Displays all Bugzilla charts that have been created into one page:
  • Reported Bugs: Resolved, Reported, Duped, and Reopened graphics bugs
  • Crash Bugs: Reported vs Resolved graphics bugs with the crash keyword
  • Uplifts: Number of bugs where an uplift was approved, broken down by product channel
  • Unresolved Bugs: Number of unresolved graphics bugs aggregated over time
  • Days to Fix (Total): Number of days on average that it takes to fix a graphics bug (resolved date - creation date)
  • Post Release Reports: Number of graphics bugs tracked for a specific version which were reported after that version was released
  • Post Release Fixes: Number of graphics bugs tracked for a specific version which were fixed after that version was released
  • Reported Crash Bugs: Number of graphics bugs with the crash keyword reported for a specific version
  • Fixed Crash Bugs: Number of graphics bugs with the crash keyword fixed for a specific version

Clicking the chart displays a more detailed view.

Bug breakdown by component Local JSON Displays the percentage of graphics bugs in each component
Regressions Local JSON Displays regression bugs comparing open and closed bugs in graphics versus all components

Prototypes

Chart Data Source Explanation
Graphics Factor (Release vs Beta) Local JSON Displays order of magnitude difference between the number of installs and crashes on Release versus Beta, as well as a comparison of crashes broken down by GPU vendor.
GPU Distribution (Release vs Beta) Local JSON Displays which graphics devices are more commonly associated with crashes in Release that do not show up at all in Beta. The purpose of this is to hypothesize which devices are not represented in our test population and need more focus in QA. The larger the bubble the more common the device.
Signature breakdown by component Local JSON Dashboard of several topcrash metrics for Graphics:
  • Top 10 Signatures: top-10 graphics/driver/webgl signatures as a percentage of overall crash volume
  • Top Signature: #1 top graphics/driver/webgl signature as a percentage of overall crash volume
  • Top 1% Signatures: top 1% of graphics/driver/webgl signatures as a percentage of overall crash volume
  • WebGL: percentage of WebGL crashes at 0.1%, 1%, 5%, and more than 5% of overall crash volume
  • Drivers: percentage of driver crashes at 0.1%, 1%, 5%, and more than 5% of overall crash volume
  • Graphics (excl. WebGL & Driver): percentage of non-WebGL and non-driver graphics crashes at 0.1%, 1%, 5%, and more than 5% of overall crash volume
  • Distribution of GFX signatures: percentage of crashes with gfx in the signature at 0.1%, 1%, 5%, and more than 5% of overall crash volume
  • Distribution of Layers signatures: percentage of crashes with layers in the signature at 0.1%, 1%, 5%, and more than 5% of overall crash volume
  • Top-10 GFX Signatures: top-10 crashes with gfx in the signature as a percentage of overall volume
  • Top-10 Layers Signatures: top-10 crashes with layers in the signature as a percentage of overall volume
  • Top 10% GFX Signatures: top 10% of crashes with gfx in the signature as a percentage of overall volume
  • Top 10% of Layers Signatures: top 10% of crashes with layers in the signature as a percentage of overall volume
  • Top 1% GFX Signatures: top 1% of crashes with gfx in the signature as a percentage of overall volume
  • Top 1% Layers Signatures: top 1% of crashes with layers in the signature as a percentage of overall volume
WGL+ Crashes Local JSON Displays the daily count and percentage of overall volume of crashes with WGL+ in the AppNotes field broken down by platform. WGL+ indicates a crash during a session with WebGL active.
D2D1.1 Success Rate Local JSON Attempts to determine what percentage of sessions have succeeded in activating Direct2D before crashing
D3D11 Layers Success Rate Local JSON Attempts to determine what percentage of sessions have succeeded in activating Direct3D before crashing
WebGL Success Rate Local JSON Attempts to determine what percentage of sessions have succeeded in activating WebGL before crashing
Android Hardware Local JSON Tracks daily crash data for the top-5 Android chipsets by graphics crash volume, percentage of crash volume, and tracks the top devices related to a Qualcomm crash spike.
GCE Memory Footprint Local JSON Attempts to correlate crashes with graphics-critical-error to out-of-memory conditions
GCE Reasons Local JSON Tracks the volume of crashes for each of the graphics-critical-error reasons as annotated in crash reports
GFX Local JSON Tracks the volume of crashes with mozilla:gfx in the signature broken down by GPU vendor (this chart was generated from historical CSV data which I think is no longer available)
GFX (%) Local JSON Tracks the percentage of crashes with mozilla:gfx in the signature
Layers Local JSON Tracks the volume of crashes with mozilla:layers in the signature broken down by GPU vendor
Android Local JSON BROKEN, don't recall what this tracked
WebGL Local JSON Attempts to determine the success rate of activating WebGL broken down by GPU vendor based on crash reports (this chart was generated from historical CSV data which I think is no longer available)
Devices (%) Local JSON Displays the top GPU devices for each vendor and what percentage of crashes they represent, broken down by channel
Drivers (%) Local JSON Displays the top GPU drivers for each vendor and what percentage of crashes they represent, broken down by channel
Startup Local JSON Displays graphics crashes which occur within the first eight (8) seconds of startup
Shutodwn Local JSON Displays graphics crashes which occur during the XPCOM Shutdown sequence
Startup vs Shutdown Local JSON Displays the percentage of overall volume represented by graphics related startup and shutdown crashes
GCE Local JSON Displays crashes with a graphics-critical-error annotation as a percentage, daily count, and 7-day average
OOM Local JSON Displays crashes where the system had less than 10% memory remaining as a percentage, daily count, and 7-day average
MOZ_CRASH(GFX_CRASH) by Channel (Filtered) Local JSON Displays graphics related crashes with a MOZ_CRASH(GFX_CRASH) annotation broken down by product channel
MOZ_CRASH(GFX_CRASH) by Channel (All) Local JSON Displays all crashes with a MOZ_CRASH(GFX_CRASH) annotation broken down by product channel
MOZ_CRASH Graphics vs Other Local JSON Compares the volume of crashes annotated with MOZ_CRASH in graphics versus all other crashes
Intel 0x0046 Local JSON Displays daily crash volume for all crashes related to the Intel 0x0046 chipset
Latest Intel Local JSON Displays the volume and percent of crashes related to the latest Intel driver branch at the time
Latest AMD Local JSON Displays the volume and percent of crashes related to the latest AMD driver branch at the time
Latest NVIDIA Local JSON Displays the volume and percent of crashes related to the latest NVIDIA driver branch at the time
Latest All (%) Local JSON Compares the percentage of crashes related to the latest GPU driver branch at the time for each vendor
Latest All (Count) Local JSON Compares the volume of crashes related to the latest GPU driver branch at the time for each vendor
NVIDIA Driver Crashes Local JSON Compares the volume of crashes related to the NVIDIA driver across each of the product channels
AMD Driver Crashes Local JSON Compares the volume of crashes related to the AMD driver across each of the product channels
Intel Driver Crashes Local JSON Compares the volume of crashes related to the Intel driver across each of the product channels
Safemode Local JSON Displays the 1-month, 1-week, and 1-day average for crashes in gfx and layers occurring in safemode

Charts behind VPN

(TO BE COMPLETED - Document the Mac Mini system)

Data Sources

Source Summary Pros Cons
Socorro Supersearch API Data is generated on page load Easiest to maintain
Data is the most current
Prone to API breakage
Data only goes back six months
Local JSON (Socorro) Manually executed Python script queries Socorro Supersearch API and updates a local JSON file Faster chart load times Data becomes outdated very quickly
Local JSON (Bugzilla) Manually executed Python script queries Bugzilla Rest API and updates a local JSON file Faster chart load times
Can load entire Bugzilla history
Data becomes outdated very quickly
Too much data really slows down the browser
Local Database Automatically executed Python script queries Socorro Supersearch API to update a local database file once a day. A second script then updates local JSON files with the most recent copy of the database. Charts are generated from the local JSON files. Faster chart load times
Longer term data availability
Greater storage requirements (~2.7GB/month)
Not publicly accessible
Higher maintenance cost (cron, python, charts)
Hybrid Uses a combination of stored JSON files (old data) and Socorro Supersearch API (new data) Longer term data possible without burden of maintaining a database Very easy to have gaps in your data
Needs constant updating of local files

Psuedocode

The bulk of the code comes from the MetricsGraphics.js library. Relevant to *this* project are the files contained in charts (html/js to load a specific chart) and data (json containing the specific chart data).

Charts

Each of the charts basically follows the same code:

  1. Load data as JSON
  2. Parse JSON data using D3.js into a JS Object
  3. In some cases do calculations based on the data
  4. Output the resulting Object as:
    {
    "date":date, (usually serves as X-axis)
    "category":value, (usually serves as Y-axis)
    ...}
  5. Pass the output object to MetricsGraphics.js for graphing

Data

First Method

Originally I was using python to generate chart data as JSON. I would call the Socorro API, get the 6-months of data back as JSON, convert it to something I wanted to chart, then write out the resulting JSON to a file in the /data folder. My charts which use this method would simply load the appropriate JSON file from this /data folder. I would have to run the .py scripts periodically to update this files and push to github were I to want the chart to update. Any newly conceived queries essentially start from scratch with a 6-month snapshot and require building up data over time.

Second Method

The second method just loads the data in realtime from the Socorro API. This is slower than the first method as it means retrieving the remote data and processing it before charting, and doing all of this on the client side. However it does benefit from the fact that updating the data on the server side is unnecessary and therefore the data is much more recent. Another benefit is that a chart representing a new query or some new set of data can be spun up fairly quickly (simply a matter of changing the API URL in the chart).

Third Method

The third method essentially combines the two. I run a python script in a daily cron job which syncs relevant data to a local database. I then run a series of python scripts to update local JSON files for the charts from this local database. The charts point to the local JSON files which at most are a day out of date and contain more than 6 months of data. The downside is that this is only accessible behind the VPN and new queries are generally harder to architect as it requires updating the data generation scripts -- however we can generally get the older trend data even if the query is new so long as we aren't looking for data that has not already been synced to the database. For example if there's field I wasn't already syncing this would need to be added and would start from the most recent 6-months of available data.