QA/Data harvesting

From MozillaWiki
< QA
Jump to: navigation, search

(MozBIT) Business Intelligence Tool for Quality Assurance

A QA engineer's role in a testing project involves schedule and effort estimations and verifying the code base stability for a successful release. Currently, these estimations are typically subjective. The availability of data driven reports and metrics can significantly improve the reliability of these estimations.

Our vision of a Business Intelligence tool (mozBIT) suite involves developing applications and processes aimed at enabling historical and predictive views of our various QA activities. Eventually, these will tie into a reliable and flexible rules based decision support system for project management. The databases of Bugzilla will form the core data set in Phase 1. We also plan to design our MozBIT suite to support analytical and reporting requirements of other teams and senior management.

As the first step, we need to elucidate and baseline the requirements collected across all QA teams in a wiki page and translate these requirements into reports (or application features). It is expected to be an iterative process as the end users will expand their requirements based on the potential demonstrated by the early versions of the BI suite.

Identify requirements and metrics: We will collect the verbose requirements in this wiki page. Quantifying the requirement as metrics within existing datasets may need some work. To illustrate;

Requirement - How stable is my release?

Metric - A graph plotting of find-fixed rate tells us what stage the release is in. This is typically a bell-shaped curve. Variations may indicate an unstable commit

Requirement -I have a testing project with Kohona backend. How much time will it take ?

Metric - If we tag all current and past projects with relevant keywords like backend = ‘Kohona’,  'regression' ,'security' etc we will find it easy to leverage them for improving our projections in similar scenarios in future projects. Thus we need to define a flexible vocabulary of these metrics that we can use to measure various properties on our projects.

Requirement - I am starting on a new testing project. What are the best practices?

Metric- The mozBIT system will analyze the bug ageing and other trends over release cycles of similar projects and mark out the red flags. Thus we see that the availability of data driven reports and metrics can significantly improve the reliability and consistency of our decisions. We expect the requirement gathering process to be an iterative process as the end users will expand their requirements based on the potential demonstrated by the early versions of the BI suite.

Things to consider

  • What problems are we trying to solve with this tool?
  • What ways would you like to use the data available in Bugzilla?
  • What kind of metrics will help you plan your releases better?
  • What are some of the "must-have" features for a data harvesting tool?
  • Do you have any tools you'd like to recommend?
  • Build/buy debate?

Phase I:Requirements

  • Web based and works with bugzilla database[KR]
  • Find/Fixed trend graphs: Given a date range - chart the ratio of 'new' bugs vs the 'fixed' bugs on a daily basis. This graph helps determine when the release has stabilized[KR]
  • Buckets:Ability to graphically group bugs based on priority,severity,component,bug reporter,bug assignee,flags etc[KR]
  • Ability to compare data from different releases[KR]
  • Ability to compare data from different projects(eg:AMO,SUMO)[KR]
  • Tabular reporting[KR]
  • Average age of a bug in each component[KR]
  • Rate/percentage of bugs being reopened[SD]
  • Admin-ability to add new data sets[KR]
  • Ability to aggregate data for commits/impact from github/mercurial[KR]
  • Regression bugs per release/project[KR]
  • Options to-CSV/JSON/Save the graph as an image
  • Average age of bugs in unconfirmed status[MK]



Before we can come up with a dashboard that provides all the required features completely, we need to lay some technological groundwork.

Flags & Keywords

A lot of the requested features rely on the information that is stored as flags and keywords with the various bugs.

  • Flags are needed to associate a bug to a product release.
  • Keywords are needed to identify regressions.

Currently that information is *not* present in the Bugzilla data warehouse. Each field is a set of multiple values that are added and removed independently over time, and it is hard for our dashboards to select bugs based on that.


Incidentally, Michael Kurzes internship project is to get the keywords and flags into the Bugzilla DW in a way that can be queried by the dashboard.

For that metrics is trying to roll with a new DW infrastructure based on a content repository (LilyCMS) which is built on top of the hbase datastore and the solr indexer. This should enable us to query the bugzilla database in a much more powerful fashion for the features listed here, and also for other teams.

Progress Card

Check the progress on each of the features here

Risk Assessment

LilyCMS is a very new software project (released July 2010), that is however backed by a company that has been developing content repository software for quite some time.

These are the major implementation steps here, that have to be completed to get the first actual dashboard feature in:

  • Set up Lily so that it understands Bugzilla bugs
    • should not be a problem
  • Modify the current Bugzilla ETL (extract, transform, load) to ouput results into Lily
    • simple in theory, but it could become very complicated as the technology is very young, relatively high risk here.
  • Extend the ETL to include keywords and flags
    • very difficult with current DW, but should be straightforward with LilyCMS.
  • Set up Solr to index keywords and flags.
    • also rather straightforward
  • Make the Bugzilla dashboard talk to hbase and solr, to get the data that is actually needed
    • the dashboard can use hbase data anyway, not so sure about accessing solr: moderate risk

This means that we have quiet a lot to do to get the first dashboard feature implemented, while all the following features would just use the same groundwork.

Meeting Notes