QA/TDAI/Test Assessment

From MozillaWiki
< QA‎ | TDAI
Jump to: navigation, search

Test Coverage Assessment

Determining how well our automated tests cover various componentry is a difficult task, either bogged down in lengthy code-coverage analysis or fraught with subjectivity. I've attempted to take a quick, objective look through bugzilla to discover the "hot spots" where developers and qa members have decided that we need more test coverage.

We will do this simple bugzilla analysis once a quarter to see how the levels are changing as we work on filling some of these test holes in our current suites, and once we feel that we have "decent" coverage, we'll do a more thorough code coverage analysis of the test suite.

Methods

I did this by analyzing two bugzilla items: the QAWanted keyword (which can mean anything from "better steps to reproduce" to "find a duplicate for this") and the in-testsuite flag which specifically is set on bugs that need to be included in our automated testing tools. I used the in-testsuite flag as the primary focus and the "QAWanted" keyword as a backup to vet that the "hottest" component when analyzed by "in-testsuite" is not a local aberration.

A fuller dimension could be added to this by surveying the developers that work on the Core platform and asking them which components they feel need the most test coverage and the most critical eye turned on them. I plan to do that and append the results here.

Looking Ahead

We also considered running a code coverage tool during the Mochikit/Reftest test run, but decided that the time it would take to set up and run and interpret the results would be better spent by writing tests. Once we have some of the big "hot spots" taken care of, then this more targetted and deterministic approach starts to make more sense. Right now, we desperately need better test coverage, and we have more holes than we have time to fill.

When we decide to do true code coverage analysis, we'll also need to find a way to interpret the results into valid a set of valid test cases that need to be written in order to exercise parts of the platform that are not exercised by the existing test suite. This is left as an exercise for the future.

Findings

The top three Core components that have bugs marked as "in-testsuite?" are:

  • Core/Layout (259 bugs marked)
  • Core/DOM (113 bugs marked)
  • Core/Layout: Tables (103 bugs marked)

Source: [1]

I also did percentages to find how the "in-testsuite?" bugs fall w.r.t. severity. "High" means that the severity is blocker, critical, or Major. "Normal" means the bugs have normal severity. And "Low" means the bugs are minor, trivial, or enhancement level severity. Here are the results for the top three components:

Component Percent High Percent Normal Percent Low
Core/Layout 49.81% 47.49% 2.70%
Core/DOM 31.86% 65.49% 2.65%
Core/Layout: Tables 21.36% 72.82% 5.83%

Source: [2]

Likewise, considering the top Components with bugs that have the "QAWanted" keyword set, you'll find some similarities which merely indicates that people generally want QA help with these types of bugs. (note that nothing prevents QAWanted and in-testsuite from both being applied to the same bug).

  • Layout: 297 QAWanted bugs
  • DOM: 28 QAWanted bugs
  • Layout:Tables: 91 QAWanted bugs

Source: [3]

Conclusions

It wasn't too surprising to find that the large general catch-alls for the two main features of the browser held the large concentrations of the "in-testsuite?" bugs. I think that these types of general catch-all categories are often targets for triage, and that may be why they have so many more "in-testsuite?" bugs than the other categories. Note that both DOM:* and Layout:* components also held many "in-testsuite?" as well as "QAWanted" bugs and underpinned my decision to list DOM and Layout as the top two "hot spots" where we need test coverage the most.

Reading many of the Layout and DOM bugs, I realized that most of them have valid testcases attached to the bug. I think the first thing I should do with this information is to get these test cases that are already in Bugzilla on the individual bugs submitted for inclusion into either the Mochikit or Reftest framework (whichever is appropriate).

This would be a quick way to make progress on broadening our regression test coverage. From there, we can then investigate writing more proactive and deeper tests to test these specific areas.

Further Work

I will go ahead with my informal survey of the developers to see if the components they flag as being spots in need of testing are mirrored by the analysis of the "in-testsuite?" flag. If not, then we'll need to remind them to set that flag on the bugs and in the components where they perceive there to be problems and/or fragility in the codebase. We'll also need to investigate any new "hot spots" that result from those thoughts to see if we can perform better triage there as a QA team.

Complete Findings

I looked at a lot of bugs. I limited my search to the Core product, and then only the Core browser specific bugs since neither automated test harness really work well in other applications at this point. I did not find enough in-testsuite? bugs inside the Firefox component to really necessitate this level of analysis.

  • Source 1: Query for all in-testsuite? bugs in all resolutions. This was the primary source of data.
  • Source 2: I took the bugzilla analysis, created percentatges for High, Normal, and Low bugs, then sorted it based on Quantity, High, Normal, and Low.
  • Source 3: I looked at the QAWanted bugs from all the components, across all severities. The intersting thing with this was that the QAWanted numbers tended to reflect the in-testsuite? numbers, which tended to corroborate the findings. Some of the components with higher QAWanted bug numbers had only few bugs flagged with "in-testsuite?". Some notable examples of this are Core/Plugins and Core/Style System (CSS). I have not yet determined if this is due to existing coverage of these areas (w.r.t. plugins I rather doubt it) or simply that we need deeper triage of these areas w.r.t. test suite inclusion.
  • Additionally, I wanted to understand how many of these bugs are currently "active" versus "closed" or "resolved". This is especially true for QAWanted since most of the time QAWanted is set on an active bug, and has less meaning for a resolved bug. These are here, and you can again see that our top three components are among the high rankers in this table also.
  • For the curious, these are the "active" "in-testsuite?" bugs. I don't think that a bug being "active" has any bearing on the relevance of the "in-testsuite" flag; however. This is just included so that you can compare a similar "in-testsuite" flag query to the QAWanted Query above.
  • And finally, the intersection of all bugs that are both QAWanted and "in-testsuite?" so that we know which ones we are counting twice.