QA/Quality Assessment

From MozillaWiki
< QA
Jump to: navigation, search

Quality Assessment Brainstorming

Goal

Devise a succinct way to assess quality of a feature and/or the entire release, with information pulled from various sources.

Requirements

  • Must be easy to communicate how to do
  • Must be easy to communicate factors outward
  • Must not take unreasonable effort to gather information
  • Must be something that can reasonably done regularly so we can plot quality over time.

Product of Quality Assessment

  1. An estimate of quality for that feature at that time.
  2. Preferably a confidence factor for that estimate.

Brainstorming

In this section, thoughts on how we should best do this can be gathered. Create a sub-section with your name(s) and add thoughts to that.

Geo & Juan (meeting)

Sources

The first thing we did was try to enumerate the various sources we could poll to find out quality. We also gave a rating on those sources based on whether they were:

  • Easy or Medium to gather from.
    • Easy means can be automated or otherwise gathered quickly
    • Medium means must be parsed with human intervention

and...

  • represented Discrete or Fuzzy data
    • Discrete means clear and probably numeric
    • Fuzzy means it requires interpretation

That gave us something like this:

Quality Sources
Source Difficulty Clarity Notes
Bugzilla E/M D feature-specific stats require whiteboarding (M), release stats don't (E)
QA owner experiences E F Feeling from usage/testing/etc.
Dev owner opinion E F Easy assumes good rapport
Other internal opinions E/M F Ease depends on how many you gather and how
Crash Stats M D Very clear data, but must be triaged for relevance
Changelog entries M D Mostly for measuring confidence, more churn == less confidence.
Input M F Requires fair amount of effort to understand trends
Litmus results M F Results must be triaged
Community posts to lists, etc. M F Not very reliable unless we ask
SUMO experience M F Easy to get, but must be triaged for relevance

Assessment Conclusions

Our early conclusions from that survey is that a split strategy may be the most appropriate:

Continuous numeric stats of the discrete quality sources
  • Bugzilla stats
    • Bug velocity (opened vs. closed)
    • # of open bugs
  • Crash stats
    • # of relevant crashes

All of these still require some degree of human intervention. Crash stats require triage, and Bugzilla requires active maintenance for proper classification of bugs via whiteboarding and/or components if they're to be broken down per-feature.

Still, these are the sources that provide numbers. Rather than rolling into "one true index," simply displaying these numbers separately with a frequent update cycle will communicate huge amounts of information.

Note that crash stats may not really be a per-feature thing, and may be more relevant for the release quality as a whole. Also, if Bugzilla stats aren't broken down by feature, the process can be completely automated with minimal intervention.

A low/medium/high value reported by the QA feature owner after considering the discrete and fuzzy sources.
  • Scores above
  • QA owner experiences
  • Dev owner opinion
  • Other internal opinions
  • Input
  • Litmus
  • Community posts
  • SUMO

The QA owner generally has a very good idea what the quality of their feature is. The challenge here is backing up that opinion externally and making sure it's well-informed.

By creating a checklist of standard sources to query and requesting brief documentation of the results of doing that, periodic reports can be made that are very defensible. These could be done on a regular basis (e.g. weekly), or done as needed with a deadline several days before any given release.

The feature-by-feature conclusions can be rolled into a colored matrix for easy review of a given release.

Confidence Conclusions

In addition, a confidence factor was discussed. The primary elements of confidence we identified are:

  • Number of recent changes to feature, especially since assessment (churn)
  • Flux in the quality assessments themselves (consistency)
  • How many sources could be queried for the report (diligence)

We could attempt to express this as a rolled-up stat. However, as this is also fuzzy, it may be simpler to keep it at "high, medium, low," with documentation of the factors above.

Other Conclusions

We also discussed which features need this level of treatment. Initial thoughts were that legacy features with no recent changes may not require this degree of diligence.

Confining this process to "hot" features will make it very achievable, albeit with the risk that something will be missed due to a misunderstanding of dependencies.

With that in mind, it was raised that an automated process for scraping the changelogs and highlighting which features are touched would be very useful. Depending on the granularity of that mapping, this could either be very achievable or very high maintenance, so would need to be considered carefully for cost vs. benefit.