Auto-tools/Projects/Signal From Noise/StatusNovember2012

From MozillaWiki
Jump to: navigation, search

State of Performance Testing: November 2012

While the Signal from Noise project is not yet complete, there have been considerable improvements to Talos and the supporting infrastructure in the preceding year.

State of Talos: November 2012

The following areas have been improved in Talos as part of the SfN project:

Several contributors have also participated in Talos development. \o/ The scope of their contributions have ranged from good first bug fixes to over-arching rewrites of parts of the software. Thanks goes out to all the folks that volunteered their time to help out here.

There are several remaining areas where the Talos software should be improved such as:

State of Datazilla: November 2012

Datazilla manages talos data with three distinct database schemas: talos_objectstore_1, talos_perftest_1, and pushlog_hgmozilla_1. The objectstore contains a single table designed to store JSON objects. These objects contain a set of untreated replicate values for every page in a given Talos test suite. They are indexed in a separate schema called talos_perftest_1. In addition to indexing test data and reference data (product type, platform information, test suite/page names) the index also stores associated metrics data. This includes the results of the welch's one sided t-test, application of false discovery rate, and exponentially smoothed means and standard deviations. The application of metrics are treated generically in the schema, so any number of statistical treatments of the raw data can be supported in the future. The pushlog_hgmozilla_1 schema maintains an ordered list of pushes that are used to compare consecutive pushes to one another. The raw JSON data generated by Talos in production is received asyncronously and not necessarily in the push order that occured from the repository. All of the database schema's can be found here: .

The user interface for datazilla was initially designed to drill down and examine the raw data associated with a Talos test. This was helpful in Q1-Q2 2012, in determining what needed to be done but does not address the issue of performance regression detection which is most relevant to developers and sheriffs. A new user interface was designed and implemented in Q4 to display the results of the new metrics treatment.

State of Statistics: November 2012

The Mozilla Metrics team, , worked as part of Signal from Noise to audit our performance statistical methodology and help develop better models. Metrics looked at the following issues:

  • Determine source(s) of variation in the data: After looking at the data from running experiments, Metrics determined two main sources of variation in the data. First, aggregating all the test pages into a single number was hiding true signal from noise as the pageload times for the 100 pages were very different. Second, the way Talos data was being collected before Q1 2012 introduced a large variation within the replicates of each test page.
  • Non-normal distributions - : Several non-normal distributions were found amongst the Talos data sets, including multi-modal distributions. One of the causes of multimodality was due to aggregation of pages with very different pageload times due to different characteristics of the pages we are testing in tp 5. Hence, it is crucial to move to page-centric testing, rather than aggregated testing.
  • Determining the number of observations per testpage: It is crucial that we have a good balance machine time for a talos test and having enough replicates for statisical viability of the test results. The optimal number of replicates for each test page for statistical testing is about 30 (J devore, Probability & Statistics for Engineering & Sciences 8th ed. p. 226). However, due to the time constraints, we decided to collect 25 replicates (still a big improvement from previous, when we collected 10 replicates but not optimal).
  • Quality of data: Some pages show systematic patterns which may indicate that there is a problem with the data being collected (may be due to hardware, software, validity of test pages, etc.). This should be investigated to ensure that the data we collect for testing correctly represents what we are trying to measure.

Datazilla utilizes these improved statistical methodologies. Datazilla uses the welch's ttest, the FDR procedure, and the exponential smoothing. A datazilla-metrics repository, , has been created, which is a python package that implements statistical methods useful for Datazilla.

Performance Testing Roadmap: 2013

It is a goal for 2013 to finish up the loose ends for talos, datazilla, and signal from noise in general:


In the last year, we've dug into every part of the performance testing automation at Mozilla. We have analyzed the test harness, the reporting tools, and the statistical soundness of the results that were being generated. Over the course of that year, we used what we learned to make the Talos framework easier to maintain, easier to run, simpler to set up, easier to test on try, and less error prone.

We have created Datazilla, an extensible system for storing and retriving all our performance metrics from Talos and any future performance automation. We have rebooted our performance statistical analysis and created statistically viable, per-push regression/improvement detection. We have made all these systems easier to use and more open so that any contributor anywhere can take a look at our code and even experiment with new methods of statistical analysis on our performance data.

But, we're not finished yet. There are more fixes to be done to the Talos framework itself. And the most critical piece of the infrastructure move still has to take place. We have to shift to using Datazilla in production and deprecating our use of Graphserver for new versions of Firefox. As we do that, we can clean out the remaining cruft in the Talos test framework, and focus our efforts on new ground breaking performance automation.Stay tuned. Or better yet, get involved: