Rationale

We need a to measure how effective our test strategy is. We need to know how well we are actually covering the code base, and we need to determine if there are areas of the code base that are not being adequately covered by the automated tests.

We want to also analyze the coverage of the Litmus cases as well, and we want to combine this analysis with an analysis of bugzilla to see which bug components are "hot" in terms of activity and other metrics. The hope here is to generate data that points to specific locations where focus is needed on improving the automated/regression testing systems.

Our focus here is more to show where we don't have adequate coverage by our tests. We aren't really going to use this to point out where we are "awesome" at testing, as that would be a bit wrong-headed.

Things Code Coverage Won't Tell Us

Does the code actually work
If the code is adequately tested (it just tells us if it is executed)
If something with a high degree of coverage is well-tested

Things it can tell us

It can be used as a barometer (over time) to understand if we are expanding our tests or if we are duplicating effort over a single area. Note that some duplication is very necessary in order to fully test various branches and pathways.
It can be used to indicate areas of the codebase that are under exercised by the current tests, and we can (over time) see if we are making progress on extending testing to those areas.
It can give us a short list of areas to begin new test development efforts

Bug Analysis

The idea for the bug analysis is to identify "hot" components in bugzilla. This is where there are lots of changes going in and where there seems to be a need for help. Our idea here is to look at the following items:

bug arrival rates - number of new bugs/ per component. This is a good indication of the stability of a component. Caveat: It could also be an indication of the triager's ability (or inability) to categorize something. For example, the arrival rate on Firefox:General is completely meaningless.
in-test-suite-? - This shows the number of bugs that people are thinking that need to be in the regression test suites. This often indicates something above and beyond simple unit tests, since the developers are doing a great job at putting unit tests into their patches. Caveat: This could also just indicate areas where writing test cases is a pain (but that probably means those areas are under-covered).
in-test-suite-+ - This is the other side of the in-test-suite question. It can show us where tests are being written. This plus the ? query can give us a picture of how a component's testing is thought about. For instance:
- high ?, high +: indicates a component with a perception of needing high amounts of test coverage
- high ?, low +: indicates a component with a perception of needing a lot of tests, but for some reason they are not being written
- low ?, high +: indicates a component with a perception of having a lot of tests written (perhaps that it is well tested, even).
- low ?, low +: indicates a component without much testing (or perhaps with a good level of unit test coverage and nothing else seems to be needed).

Combining these analysises with the bug-arrival rate can either reinforce or dispute the findings of the in-test-suite queries (i.e. something that is low, low, with a high bug rate probably doesn't have good test coverage).

Code Coverage

The code coverage measurement is going to be a difficult one to manage. Most code coverage tools work only on Java or C or C++. This means that we won't have any visibility into the javascript side of the code coverage tool. We might be able to use Venkman's profiler to check for specific areas and get some idea of functional coverage, but we won't get any code coverage analysis for lines or branch statements in javascript.

We are analyzing the best tools for each main platform (windows, linux, and mac). We will do one run on each platform using the best tool we can find. If the runs are rather similar, then we may just standardize on one OS to perform the measuring against. At any rate, we cannot compare results from one OS to another, unless we are running the same tool; however, there doesn't seem to be one tool that will run on all three OS's.

The tools we are currently evaluating:

Windows
- CoverageMeter
Linux
- Dynamic Code Coverage
Windows and Linux
- Bullseye coverage
Mac
- Still looking

QA/TDAI/Code Coverage

Contents

Rationale

Things Code Coverage Won't Tell Us

Things it can tell us

Bug Analysis

Code Coverage

Navigation menu

QA/TDAI/Code Coverage

Rationale

Things Code Coverage Won't Tell Us

Things it can tell us

Bug Analysis

Code Coverage

Navigation menu

Search