Summary
Understanding the Problem Space
First order of business for my transition to the Graphics team is to understand the problem space so I can understand the immediate needs of the team and make the best impact I can in the shortest amount of time.
- What are the key problems/challenges facing the Graphics team in terms of quality?
- discrepancy in environments between testers and release users
- discoverability of bugs pre-release
- ?...
- Where can QA add value/support to the Graphics team?
- improving pre-release discoverability of bugs
- closing the gap between tester and release systems
- helping with bug triage, particularly with bugs hiding in general components
- representation in crashkill
- improving code coverage and/or identifying gaps in code coverage
- identifying ways to improve participation in the graphics team (events, projects, One & Done, etc)
- documentation of tools, testing processes, etc
- building out the lab in Toronto
- continuing to drive Betabreakers testing every 6 weeks
- verifying bug fixes (what does this look like)?
- profiling areas of risk (eg. troublesome configs)
- conducting root cause analysis for regressions
- understanding problems outside of our control (eg. driver resets)
- feature testing and upcoming priorities (e10s, Windows 10, El Capitain, Android, B2G, etc)
- What does QA need to know to be effective?
- key components of an actionable Graphics bug
- fundamentals/technologies that should be learned
- how to distinguish a graphics crash from a non-graphics crash with a graphics signature
- meetings, mailing lists, bugzilla components to watch, blogs, IRC channels to join, etc
- who is each member of the team (incl. contributors) and what do they do
- where does graphics code reside in the tree?
- what role does Unified Telemetry in graphics quality?
- what are the prefs to enable/disable different functionalities?
- we need a database of known-troublesome hardware/driver configurations to inform testing, hardware acquisitions, and blocklisting
Sanity Checking
- Desktop
- Boot 2 Gecko (No-Jun Park)
- Android
- Telemetry
Stability
How do we identify a graphics crash?
- by signature: gfx, layers, D2D, D3D, ?...
- by topmost filename: gfx, ?...
- ?...
How do we prioritize graphics crashes?
- Overall topcrashes in release > beta > aurora > nightly
- Gfx crashes in release > beta > aurora > nightly
- Explosive crashes in release > beta > aurora > nightly
What tools do we have at our disposal to investigate crashes?
- Bughunter for investigating crashes correlated to a URL
- KaiRo's reports for identifying crashes that are new or escalating quickly
- Socorro for getting detailed information about crash reports
What information is needed to make a crash actionable by developers?
- Correlations to particular hardware, driver, add-on, 3rd-party software, or library
- ?...
Weekly Triage
Top Driver Crashes
| Driver | Description | Weekly crashes 2015-07-07 |
Top Crash |
|---|---|---|---|
| AMD | |||
| amdocl.dll | OpenCL 1.1 AMD-APP | 17 | |
| aticfx32.dll | AMD Radeon DirectX 11 Driver | 326 | |
| atidxx32.dll | AMD Radeon DirectX 11 Driver | 3,031 | |
| atioglxx.dll | ATI OpenGL Driver | 35 | |
| atiumdag.dll | AMD Radeon DirectX Universal | 2,320 | |
| Intel | |||
| igd10umd32.dll | Intel Graphics LDDM User Mode Driver for Windows 8 | 11,436 | bug 905902 [9.751% overall] |
| igdumd32.dll | Intel Graphics LDDM User Mode Driver for Windows Vista | 1,843 | |
| igddxva32.dll | Intel Graphics WDDM User Mode Driver for Windows 7 | 96 | |
| nVidia | |||
| igd10iumd32.dll | nVidia D3D Shim Driver | 5,097 | |
| nvapi.dll | nVidia Windows Driver | 435 | |
| nvd3dum.dll | nVidia Windows WDDM D3D Driver | 2,016 | |
| nvlsp.dll | nVidia Application Filter | 357 | |
| nvumdshim.dll | nVidia D3D Shim Driver | 2,048 | |
| nvwgf2um.dll | nVidia D3D10 Driver | 15,657 | bug 1181349 [3.874% overall] |
| 44,714 | |||
- Process
- Once per week update the crash volume and use that to determine triage priority (triage highest volume first)
- Click the link for the driver you want to investigate
- Select the Signature Facet tab and click the top signature which does not have an associated bug report
- Click the See equivalent Report list page link at the top of the report
- Make note of any high correlations such as operating systems, product versions, and graphics adapter chipsets/families from the Signature Summary tab
- Click one of the reports from the Reports tab
- Click the Report this bug in Core link to report a bug
- Include the following information in the report:
- A copy of the stack from the report
- A link to more reports with the same signature
- Correlations to a particular operating system, driver, chipset, and graphics card
- Any comments with relevant information which might be worth following up
- Any URLs you find in the reports which might be worth testing
- Be sure to add topcrash-nvidia, topcrash-amd, or topcrash-intel to the QA Whiteboard field
- See bug 1181349 as an example
Features
- Gecko 39: OOM driver issues
- Gecko 40: OMTC on all platforms
- Gecko 41: WebGL 2, E10S M3
- Gecko 42: Desktop Tiling, Desktop APZ, Desktop Silk
Participation
- Sanity checking via One & Done
- Meetups to connect testers/users with devs
- Testdays to teach people about graphics testing
- Documentation and translation of documentation
- Engaging on community spaces (Discourse, Reddit, Facebook, Twitter, etc)
Betabreakers
Testing:
- [DONE] Firefox 38: MSE stress test
- [DONE] Firefox 39: beta sanity check
- [DONE] Firefox 40: WebGL with e10s
- Firefox 41: exploratory testing Windows 10 in Aurora, gfx-noted bugs (eg. [1], [2], [3], [4]), blocklisted hardware, disabling WARP
- Firefox 42: to be determined
- Firefox 43: to be determined
- Firefox 44: to be determined
Risks:
- Betabreakers will not have Windows 10 deployed to machines until after it is officially released. However, they can deploy Preview to select machines upon request. We need to develop a set of requirements for Windows 10 testing, particularly machine specifications for any upcoming testrun that targets Windows 10.
- We need to select hardware for testing based on data from past testruns and known-troublesome hardware
- We need to identify gaps in test coverage and investigate whether they can fill these gaps for us