QA:Metrics&Coverage

From MozillaWiki
Jump to: navigation, search

Coverage and Bug metrics

INTRODUCTION : What & Why

We are mining the bug database and code coverage data to ask better questions and have better answers.

This is not about which areas are “good” or “bad” but to help QA and DEV to spend time in more efficient ways.

We want to understand which components or files (areas) could use more community support or resources to reduce the number of feature bugs or regressions by targeted test development, code reviews and/or improved planning.

We == module owners, release team leads, individual developers and test developers, etc.

  • Areas of interest:
    • Which areas have the highest incoming bugs.
    • Which areas have the highest bug rates, regression rates, security bugs.
    • Which files at the product or component level have highest regression and security bug fixes.
    • Which areas have lurking severity of critical or blocker
    • How long it takes to really fix a bug. We're are talking about bug fixed rate ( bugs fixed / total people at the time)
    • Which areas have the highest re-fix/re-work rates.

SECTION I : Identifying the testing gaps in automated testing infrastructure.

  • Trend graphs of coverage by Whole|Product|Component
  • Trend graphs of coverage by Whole|Product|Component vis-a-vis all bugs filed (incoming bugs)
  • Trend graphs of coverage by Whole|Product|Component vis-a-vis all bugs fixed (outgoing bugs)
  • Trend graphs of coverage by Whole|Product|Component vis-a-vis Regression bugs filed
  • Trend graphs of coverage by Whole|Product|Component vis-a-vis Regression bugs fixed
  • Trend graphs of coverage by Whole|Product|Component vis-a-vis Security bugs filed
  • Trend graphs of coverage by Whole|Product|Component vis-a-vis Security bugs fixed Identifying top 5 components from code coverage point of view and 6 files each in component.

The above set of metrics would provide visibility into where we need to focus in terms of writing new test cases and ideally we would like to engage community members and coders at universities to come up with scenarios to provide better use cases in areas that have week coverage and/or more bug traffic as well as more change-set traffic.

  • Function coverage|Line coverage|Branch Coverage graphs over time


SECTION II: Bug Metrics

We want to find out

  • What are the top components where there are a lot of incoming critical and blocker bugs and how fast are we able to triage them to confirm or not.
  • After the bug is confirmed, how long are we taking to fix it.
  • After the bug is fixed how long are we taking to verify.
  • What is the mean time between confirmed to verify of bugs by components.
  • What is the re-fix rate and which component have significant re-fix rates.
  • What is the duplicate bugs rate and which components have significant duplicate rates.

Using the following set of metrics

  • All bugs : Filed Vs confirmed by Whole|Product|Component by SEV status
  • All bugs : Filed Vs confirmed by Whole|Product|Component by PRI status
  • Regression bugs : Filed Vs confirmed by Whole|Product|Component by SEV status
  • Regression bugs : Filed Vs confirmed by Whole|Product|Component by PRI status
  • Security bugs : Filed Vs confirmed by Whole|Product|Component by SEV status
  • Security bugs : Filed Vs confirmed by Whole|Product|Component by PRI status
  • All bugs : NEW Vs FIXED by Whole|Product|Component by SEV status
  • All bugs : NEW Vs FIXED by Whole|Product|Component by PRI status
  • Regression bugs : NEW Vs FIXED by Whole|Product|Component by SEV status
  • Regression bugs : NEW Vs FIXED by Whole|Product|Component by PRI status
  • Security bugs : NEW Vs FIXED by Whole|Product|Component by SEV status
  • Security bugs : NEW Vs FIXED by Whole|Product|Component by PRI status
  • Top 5 components that have high RE-FIX rate w.r.t All bugs|Regression bugs|Security bugs.
  • Top 30 files that have higher “All bugs|Regression bugs|Security bugs” fixes.


Filed vs Confirmed provides the potential backlog of discovering a bug in the source trunk. What it means is that when a volunteer finds a bug and files one, it still stays uncovered with respect to developers as long as it is not gone through triage and confirmed. There is a good chance that we might rediscover the bug independently using our internal testing infrastructure but that is a chance we are taking.

The NEW Vs FIXED provides the mean time to fix a bug and helps to develop heuristics on the bug fix capacity of a given group. For example, if group X took one week on average to fix a bug per head in the last 6 months and it has say, 10 members, then their output is 10 bug fixes per week and they should ideally plan up to 80% of their capacity per week for a given major release. The remaining 20% gets spent in the rework of fixes and for other contingencies.

Identifying a small set of files that are generally bug sensitive could potentially help us to define additional gating methods for code reviews on these files in the advanced stages of release cycle.

We also need to identify the top 6-10 bug filers and use their expertise in bug discovery using a key beta program or similar effort.

Potential Priorities

  • Fix Capacity [P1]
  • Find Rate [P2]
  • Fix Rate [P2]
  • Cumulative Fix Capacity [P3]
  • Mean time between Fixes [P3]
  • Mean time between Failures [P3]
  • Re-Fix/Re-Open Rate [P3]
  • Non-Fix Close trends [P3]

   

  • PLOT ALL GRAPHS ONE BELOW OTHER GROUPED BY YEAR AND MONTH OF THE YEAR

   

Priority Matrix General Regression Security
P1 1WK 1WK 1WK
P2 1MN 1MN 1MN
P3 1Q 1Q 1Q

What is the Progress so far

CodeCoverage

  • We have a fully automated release engineering maintained C/C++ code coverage results generation process in place.
    • The following improvements are in the planning phase
      • Splitting major test suites into smaller groups.
        • If a test case crashes a suite run, the remaining tests after the crash will remain un-run and that adds fluctuations to code coverage data.
        • In order to minimize the impact of crash/hang/kill effects, we would split the test suite into smaller chunks that run independent of each other. This would help to localize the crash/hand/kill impact.
  • We have manual process in place to generate javascript coverage results from Firefox runs and aggregate coverage results from multiple runs on the same executable.
    • The following tasks are in progress
      • Automate the process of generating and aggregating javascript coverage runs data.
      • Hand off to RE to include in regular code coverage runs.

Metrics

The following metrics are already generated

  • Bugs to change-sets Ratio by product/component.
  • Coverage graphs by run, by product and by component.
  • Filed vs Confirmed Bugs trend by time, by product, by component.
  • Security bugs by product,component by time
  • Regression bugs by product, component by time.

Work is in progress to integrate the graphs and related data into Pentaho Data Warehouse.

Priority of Tasks

Coverage Priorities

  • P1 Automating JS coverage data generation
  • P2 Creating data streaming from coverage data into a DB for further analytics.

This requires a public facing webserver and DB support.

Metrics Priorities

  • P1 Exposing current metrics charts to community using existing set-up.
  • P2 Creating the bug metrics data in the Pentaho Data Warehouse (PDWH).
  • P2 Generating graphs/charts off of Pentaho DWH.
  • P2 Exposing Pentaho DWH to Public.