Platform/GFX/Triage: Difference between revisions

From MozillaWiki
< Platform‎ | GFX
Jump to navigation Jump to search
(Created page with "== Triage == Triage is the process of getting actionable data from a bug report. The Graphics bugzilla component gets many bug reports and it's important to get at least one...")
 
m (Rhunt moved page Platform/Triage to Platform/GFX/Triage: Forgot to put this page under graphics)
(No difference)

Revision as of 19:08, 24 August 2018

Triage

Triage is the process of getting actionable data from a bug report. The Graphics bugzilla component gets many bug reports and it's important to get at least one person to see each report so we don't miss anything significant.

The graphics team has an official triage schedule that you can view here. The purpose of this is to spread out the work so that no one is spending all their time on bugzilla.

Procedure

The goal of triaging is to determine the root cause of the issue a user is facing. Sometimes it's very obvious what the problem is and who is responsible. In that case there's not much to do, just needinfo? the person responsible.

When that's not the case, there's a general procedure you can follow to triage a bug.

  1. Ask for clarification if anything is unclear
  2. Determine if you can reproduce the issue, and if they can reproduce the issue
  3. If someone can consistently reproduce the issue, we have the best chance at fixing the issue.
    1. Try and use mozregression to find the changeset that created the regression
    2. Attempt to use a debugger (or rr) to catch the error
  4. If someone can only inconsistently reproduce the issue, it's helpful to get information to narrow down the cause.
    1. Determine versions affected
    2. Determine platforms affected
    3. See if safe mode causes the issue to go away
    4. See if the issue is specific to a specific computer
    5. Look at a dump of about:support to see if there is anything abnormal
  5. If no one can ever reproduce the issue, there's not much we can do. Close the bug as WORKSFORME.

Once you have determined the issue, you need to give the bug a priority so it is marked as triaged. The priority policy can be found here.

Tips for common issues

The graphics team runs into some types of problems frequently. It's helpful to know them so you can identify them quickly.

Performance issues

This is a broad category of issues that is beyond the scope of this document. A good first step is to ask for a performance profile using perf-html. Additional information for profiling with perf-html can be found here.

Note: If the issue is slow painting, be sure to ask the reporter to include 'Paint' in the list of threads in the profiler settings.

Driver rendering issues

We use OpenGL and Direct3D for hardware accelerated compositing and WebGL, and Direct2D for hardware accelerated painting. Sometimes, the drivers for specific hardware vendors are not compliant with the specification in certain ways. This can manifest itself in various forms of graphics corruption.

If the user only experiences this graphics corruption on certain hardware, this is a strong sign it's a driver issue. Enabling safe mode should also make the issue go away.

An important task when a user reports graphical corruption is to make sure their drivers are up to date. This can often be fix the issue.

If they're on the latest driver, then there are usually three things we can do, in order of increasing difficulty:

  1. Blacklist the driver
  2. Find a workaround
  3. Get the vendor to fix the issue

Which option depends on the specific circumstance and prevalence of the bug.

Driver crashes

This is very similar to above, except that it manifests itself in crashes. There's not always much we can do here besides blacklist these drivers.

Platform specific rendering issues

We have different backends for painting content on different platforms. Currently this is just Skia and Direct2D.

This can mean that we paint some content correctly on OSX, where we use Skia, but incorrectly on Windows, where we use Direct2D.

A good way to confirm this is to manually change the content or canvas backend using preferences.

  1. `gfx.canvas.azure.backends`
  2. `gfx.content.azure.backends`

Heavy checkerboarding

This is something that can often get misreported. Once APZ was enabled on all platforms, we began to be able to scroll faster than we repaint the screen. When we scroll into unpainted areas, the background color of the page will fill in that area. This is called checkerboarding.

Some users are not used to that, and if the page has a background color and an image overlay that differ significantly, the user may think Firefox is glitching. A good word to look for here is "flashing" of content.

The usual solution to these bugs is to inform the user about checkerboarding, and ask for a performance profile to see if we can improve our performance so we don't checkerboard anymore.

Async image decoding

This is another issue that users often report as "flashing". Sometimes web authors will insert image elements and rely on the bitmap synchronously decoding. This can result in a frame or two of painting content without the image element which can mess with animations.

The best option here is to explain what's going on and recommend using 'decoding="sync"'. At the time of this writing this is implemented in Gecko but not enabled.

Incorrect web painting

These issues are generally reported as differences in rendering between multiple browsers.

In these cases, it's good to see if the proper behavior is specified in a specification. If it is, we probably should adopt that behavior.

If it's not, then there are a few options:

  1. Implement other web browsers' behaviors. This is a good options when their behavior makes more sense for something that isn't very important.
  2. File a spec issue to discuss the proper behavior.

Invalidation issues

These issues manifest themselves as graphics corruption from dynamic changes to the DOM/CSS. For example, the user presses a button which changes the DOM/CSS and then a part of the screen remains the same when it should update to new content.

For these issues, there are multiple places where something can go wrong.

  1. DisplayListBasedInvalidation - Check DisplayListBasedInvalidation for more information
  2. RotatedBuffer/Tiling - Confirm this is the case by toggling 'layers.enable-tiles' to switch between tiling and rotated buffer. This will not work on OSX where we only support tiling.
  3. Compositor/ShadowLayers - Confirm this is the case by disabling hardware acceleration with 'layers.acceleration.disabled'.

Intermittent failures

The graphics components often get reports of intermittent test failures that are graphics related. Treeherder automatically files these bugs to mark test failures correctly.

If they are high frequency enough someone will needinfo to get it prioritized. They should usually be labeled P3 for backlog unless you wish to work on a fix.