QA/Platform/Graphics
Consult this page for more information.
Metrics
- Crash Rates: STAGE - PROD
- Metrics: Code Quality, Quality Indices
- Telemetry: GFX, All
- Bugzilla: Bug Age
Documentation
- Team Page
- Device Inventory (moved to wiki)
- PCI Device ID List
- nVidia Codenames
- Graphics driver blocklisting
- Draft guides
Understanding the Problem Space
First order of business for my transition to the Graphics team is to understand the problem space so I can understand the immediate needs of the team and make the best impact I can in the shortest amount of time.
- What are the key problems/challenges facing the Graphics team in terms of quality?
- discrepancy in environments between testers and release users
- discoverability of bugs pre-release
- ?...
- Where can QA add value/support to the Graphics team?
- improving pre-release discoverability of bugs
- closing the gap between tester and release systems
- helping with bug triage, particularly with bugs hiding in general components
- representation in crashkill
- improving code coverage and/or identifying gaps in code coverage
- identifying ways to improve participation in the graphics team (events, projects, One & Done, etc)
- documentation of tools, testing processes, etc
- building out the lab in Toronto
- continuing to drive Betabreakers testing every 6 weeks
- verifying bug fixes (what does this look like)?
- profiling areas of risk (eg. troublesome configs)
- conducting root cause analysis for regressions
- understanding problems outside of our control (eg. driver resets)
- feature testing and upcoming priorities (e10s, Windows 10, El Capitain, Android, B2G, etc)
- What does QA need to know to be effective?
- key components of an actionable Graphics bug
- fundamentals/technologies that should be learned
- how to distinguish a graphics crash from a non-graphics crash with a graphics signature
- meetings, mailing lists, bugzilla components to watch, blogs, IRC channels to join, etc
- who is each member of the team (incl. contributors) and what do they do
- where does graphics code reside in the tree?
- what role does Unified Telemetry in graphics quality?
- what are the prefs to enable/disable different functionalities?
- we need a database of known-troublesome hardware/driver configurations to inform testing, hardware acquisitions, and blocklisting
Participation
- Sanity checking via One & Done
- Meetups to connect testers/users with devs
- Testdays to teach people about graphics testing
- Documentation and translation of documentation
- Engaging on community spaces (Discourse, Reddit, Facebook, Twitter, etc)
Telemetry
- COMPOSITE_TIME: time in CompositorParent::CompositeToTarget dispatching draw calls and calling SwapBuffers, but not texture upload (ie. complete composition)
Projects
Sanity Checking
| Project | Summary | Frequency | Status |
|---|---|---|---|
| Betabreakers | Per-version lab testing against Developer Edition | Every 6 weeks | 2015-11-20: 44.0a2 testrun in planning |
| Boot2Gecko | Sanity checks on phones in the Toronto office | Weekly | 2015-11-20: 45.0a1 testrun in progress |
| One & Done | Crowd-sourced testing via One & Done (results) | Daily | 2015-10-15: Reviewed latest results |
| Toronto Lab Testing | Sanity checks on systems in the Toronto office | N/A | 2015-08-20: On hold due to lack of resources. |
Stability
The purpose of this project is to identify the most concerning graphics crashes and escalate them to developers.
Understanding the Problem
How do we identify a graphics crash?
- by signature: gfx, layers, D2D, D3D, ?...
- by topmost filename: gfx, ?...
- by driver (DLL, version, ?...)
- by device/vendor ID?...
- ?...
How do we prioritize graphics crashes?
- Overall topcrashes in release > beta > aurora > nightly
- Gfx crashes in release > beta > aurora > nightly
- Explosive crashes in release > beta > aurora > nightly
What tools do we have at our disposal to investigate crashes?
- Bughunter for investigating crashes correlated to a URL
- KaiRo's reports for identifying crashes that are new or escalating quickly
- Socorro for getting detailed information about crash reports
What information is needed to make a crash actionable by developers?
- Correlations to particular hardware, driver, add-on, 3rd-party software, or library
- ?...
Device Driver Crashes
Queries
- Top crashes by device
- Top crashes by driver
- AMD: amd*.dll, ati*.dll
- Intel: igd*.dll
- NVIDIA: nv*.dll
Top Crashes
- Process
- Load the Firefox crash-data dashboard
The first thing you'll see is a graph showing the crash rate for each of the branches.
The second thing you'll see is a list of reports for each of the branches. - Click the Top Crashers link for the version you want to look at
From here you'll see a large table of crash signatures ranked by the number of reports in the last week.
Typically you'll want to start by looking at Release (right-most) as that's the highest priority. - Starting at the top of the table, scroll down until you find a graphics-related crash
** NEED A PRIMER ON IDENTIFYING A GRAPHICS CRASH **
- Bugs
- 1908798 - Crash in [@ IPCError-browser | GPUProcessKill]
- 1972146 - Crash in [@ <unknown in igd10um64xe.dll> | CContext::TID3D11DeviceContext_Draw_<T>]
- 1996653 - Mac GPU process: Crash in [@ IPCError-browser | GPUProcessKill]
- 2007715 - Hit MOZ_CRASH(Element state change during style refresh (6291456)) at /builds/worker/checkouts/gecko/layout/style/RestyleManager.cpp:3374
- 2014923 - Crash in [@ mozilla::widget::LayerViewSupport::RecvScreenPixels]
- 2014925 - Crash in [@ <gleam::gl::GlesFns as gleam::gl::Gl>::get_shader_info_log]
Triage
The goal of triage is to ensure issues are reviewed, prioritized, and escalated in a reasonable amount of time.
Bug Triage
| Category | Description | Frequency | Status |
|---|---|---|---|
| Cold Crashes | Crash bugs which have not been updated in more than 30 days | Monthly | Last reviewed 2015-11-09 |
| Cold Trackers | Tracked bugs which have not been updated since the previous release | Every 6 weeks | Last reviewed 2015-11-03 |
| Help Wanted | Bugs which lack information necessary to resolve | Monthly | Last reviewed 2015-11-09 |
| Incoming | Bugs reported but not triaged for more than one month | Monthly | Last reviewed 2015-11-09 |
Crash Triage
| Category | Description | Frequency | Status |
|---|---|---|---|
| Explosive Crashes | Check KaiRo's branch explosiveness reports for new or spiking crashes | Weekly | Last reviewed 2015-11-24 |
| Top Crashes | Check the Socorro topcrash reports for new or spiking crashes | Weekly | Last reviewed 2015-11-24 |
Get Involved
Want to help the Graphics team? Here are some ways you can get involved.