Auto-tools/Projects/Signal From Noise: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Created page with "= Signal From Noise = Making sense of the Talos results == Overview == Historically we have had an 'acceptable' range of fluctuation in our talos number. Our methods of manag...")
 
 
(103 intermediate revisions by 8 users not shown)
Line 1: Line 1:
= Signal From Noise =
= Signal From Noise =


Making sense of the Talos results
Making sense of the Talos results  


== Overview ==
[[Auto-tools/Projects/SfN_2013Q2|Specifc goals 2013Q2]]


Historically we have had an 'acceptable' range of fluctuation in our talos number.  Our methods of managing and tracking the numbers have all been surrounding running a test multiple times and generating a single number that we can track over time.  This is great for long term tracking, but when looking at what that number represents and why it fluctuates there is a lot of room for error.  
== Overview ==


We want to do a better job of generating our 1 tracking number. We also want to revisit the way we are testing things and make sure we are running the right tests and the correct number of iterations to get a reliable data point. Most likely this involves looking at every page that we have and tracking that page individually, not as a small piece of a larger set of pages.
Historically we have had an 'acceptable' range of fluctuation in our talos number. Our methods of managing and tracking the numbers have all been surrounding running a test multiple times and generating a single number that we can track over time. This is great for long term tracking, but when looking at what that number represents and why it fluctuates there is a lot of room for error.  


== Background ==
We want to do a better job of generating our 1 tracking number. We also want to revisit the way we are testing things and make sure we are running the right tests and the correct number of iterations to get a reliable data point. Most likely this involves looking at every page that we have and tracking that page individually, not as a small piece of a larger set of pages.


Most of this project is outlined well at on the [[https://wiki.mozilla.org/Metrics/Talos_Investigation Talos Investigation]] page.
Most of this project is outlined well at on the [[https://wiki.mozilla.org/Metrics/Talos_Investigation Talos Investigation]] page.


== Action Items ==
=== Goals  ===


The Goal by March is:
Original Goals:
* Have the tools (pageloader, talos, graphserver) retooled so we can research new tests and run tests in a more reliable fashion
* define what is signal and what is noise
* Implement and roll out tdhtml using the new toolchain
* understand the distribution of numbers and have confidence that our representation is meaningful
* Have a process in place for adding new tests and pagesets into the tool set


=== Milestone 1 ===
Appended goals:
* discard the first iteration of a page load
* ensure our tests are testing the right things
* add options to pageloader for alternative page loading and measurements
* return performance regression results per push and posted to tbpl in real time
* add options to talos configuration to support new pageloader requirements
* provide tools to quickly investigate performance regressions, compare changesets/branches, and identify trends
* create a v1 of the dhtml test using new methodology
* work with rhelmer and jeads to start discussion of what data we want
** samples of work that :slewchuk did, mixed with inital data from dhtml results
* Initial version of database requirements to host new data
* Blog frequently about progress and goals


=== Milestone 2 ===
== Drivers  ==
* Validate tdhtml data with metrics
* Generate single 'metric' to track tdhtml as we currently do
* Ensure core database and input methods for data are deployed
* Start rolling out on branches with side by side staging
* Beta version of UI live for inital data from the branches
* Start investigating tsvg and a11y for optimal sampling sizes and accuracy
* Continue to blog and post to newsgroups


=== Milestone 3 ===
* Datazilla: jeads
* Continue rolling out tdhtml to other branches
* Talos: jhammel, jmaher
* Enhance tools like compare-talos and regression-finder to work with new tdhtml
* Metrics: christina (as needed)
* Write analysis toolchain for investigating new tests and pages (i.e. the work we do on tsvg and a11y should be automated)
* Integrate analysis toolchain into existing tools as much as possible
* Version 1.0 of the new UI should be available.  Multiple views on the same data as well as drill down from given data point or time window


=== Milestone 3.14 (bonus work if all goes well) ===  
== Meetings  ==
* Define requirements for a Version 2.0 of the new UI
Meetings are every other [http://arewemeetingyet.com/Los%20Angeles/2013-12-19/11:00/b/Signal%20From%20Noise Thursday at 11AM Pacific Time] in Joel Maher's Vidyo room.
* start rolling out tsvg and a11y
 
* start investigating tp5 (or maybe it is time for tp6 and we start there)
* Notes from the status [https://wiki.mozilla.org/Auto-tools/Projects/Signal_From_Noise/Meetings meetings] are now on the the [[https://etherpad.mozilla.org/SignalFromNoise etherpad]].
* enhance compare talos toolchain to show differences from a try server run to the baseline (easier talos development as well as firefox development)
 
* The Datazilla project is holding [[Auto-tools/Projects/Datazilla/Meetings|Datazilla focus group meetings]] with interested developers to judge our progress toward fixing use cases that developers and tree sheriffs care about.
 
== Status ==
* Performance Testing Reports:
** November 2011 - [[Auto-tools/Projects/Signal_From_Noise/StatusNovember2011]]
** November 2012 - [[Auto-tools/Projects/Signal_From_Noise/StatusNovember2012]]
** 2012 Execution - [[Auto-tools/Projects/Signal_From_Noise/Execution2012]]
 
== Bugs  ==
* Signal from Noise bugs are marked with the [https://bugzilla.mozilla.org/buglist.cgi?esolution=---&status_whiteboard_type=allwordssubstr&query_format=advanced&status_whiteboard=%5bSfN%5d SfN whiteboard entry]
* Fixed signal from noise bugs: https://bugzilla.mozilla.org/buglist.cgi?list_id=5017261;resolution=FIXED;status_whiteboard_type=allwordssubstr;query_format=advanced;status_whiteboard=[SfN]
 
== UI (Datazilla) ==
 
* original mockups: [[Media:TalosSignalFromNoiseMocks.pdf]].
 
=== Use Cases  ===
 
Currently:
 
*Firefox developer:
**push patch to mozilla-central, expect green talos results
***all results are green unless test fails to complete, then it is red
**notification to dev.tree-management indicates a regression
***developer goes to graphs-new and looks at the (test, platform, branch) graph
***maybe compares to other platforms or branches
 
*Talos developer
**adds new feature to talos with expected change in numbers
**run change side by side as a new test name for 1 week
**browse to graphs-new to view new_test vs old_test to look at raw data points over a few days on each platform
 
Proposed 1 (assuming 1% deviation):
 
*Firefox developer
**push patch to mozilla-central, expect green talos results
***number outside of 2% from gold standard, run turns orange
***orange run has link on tbpl to graph server
***graph server has a quick line of historical data and other platforms
***then a focused section of what the gold standard is and what that run produced
***it would be nice to see what the previous 5 runs had in terms of numbers, as well as all other platforms
**no need for notification mails to dev.tree-management since this is managed in tbpl
**FLAW: if firefox adjusts the standard number (up or down), then how do we call it the new standard?
***maybe the web interface can have a way to change the number on the fly and put a bug/comment for the adjustment
 
*Talos developer
**adds new feature to talos with expected change in numbers
**while pushing add an entry to the graph server of the new expected number
**no need for side by side since we are just comparing to a known standard number
 
=== Data  ===
 
Graph Server:
Current:
 
*data from tests (tp5 sample)  
**noisy output on console (coming from pageloader)
 
NOISE: |i|pagename|median|mean|min|max|runs|
NOISE: |0;thesartorialist.blogspot.com;852;864.3333333333334;809;1135;951;852;920;865;809;858;851;1135;836;837
NOISE: |1;cakewrecks.blogspot.com;264;266.55555555555554;252;651;651;263;252;273;264;275;260;292;268;252
 
**data sent to the graph server
 
0,852.00,thesartorialist.blogspot.com
1,264.00,cakewrecks.blogspot.com
 
Right now we are sending the median value (without the highest value in the set) to the graph server for each page. On the graph server, we [[http://hg.mozilla.org/graphs/file/1237d38a299b/server/pyfomatic/collect.py#l208 calculate our metric]] for tp5 by averaging all the uploaded median values except for the max value.
 
*TODO* define the perf counters that we collect and upload.
 
Datazilla:
* volume - TODO
* storage - TODO
* format - TODO
 
== Links  ==
 
*https://wiki.mozilla.org/Metrics/Talos_Investigation
*https://groups.google.com/forum/#!topic/mozilla.dev.planning/nxR6tcDmZWQ
*https://groups.google.com/forum/#!msg/mozilla.dev.platform/kXUFafYInWs/XRCsrapUUGAJ
*The script to pull the talos data out of elastic search: https://github.com/salamand/ESTalosPull
*The Log harvester that helped pull logs directly from pulse: https://github.com/salamand/PulseLogHarvester
*https://wiki.mozilla.org/Perfomatic#Architecture
*{{bug|706912}}
*where regression emails go: http://groups.google.com/group/mozilla.dev.tree-management/topics?lnk=srg&pli=1
*http://people.mozilla.org/~jmaher/sxs/sxs.html
*http://k0s.org/mozilla/blog/20120131164249
*https://etherpad.mozilla.org/graphserver-next
* https://plus.google.com/u/0/108996039294665965197/posts/8GyqMEZHHVR on bimodality
* http://www-plan.cs.colorado.edu/diwan/asplos09.pdf
* http://www.jerrydallal.com/LHSP/LHSP.HTM
* http://www.jerrydallal.com/LHSP/npar.htm
* http://datazilla.readthedocs.org/en/latest/
* https://github.com/mozilla/datazilla-metrics
* https://wiki.mozilla.org/images/d/dd/Talos_Statistical_Analysis_Writeup.pdf
* http://www.stat.purdue.edu/~doerge/BIOINFORM.D/FALL06/Benjamini%20and%20Y%20FDR.pdf
* https://wiki.mozilla.org/images/3/38/Tp5_Good_Pages.pdf : The Good Pages
* http://people.mozilla.org/~ctalbert/TalosPlots/rowmajor_change/index.html
* https://wiki.mozilla.org/images/2/2c/Plots.pdf : 40 pagecycles time series plots
* https://wiki.mozilla.org/images/e/e0/40_page_loads.pdf : 40 pagecycles time series plots
* http://plasma.cs.umass.edu/emery/stabilizer : Stabilizer is a compiler and runtime system that enables statistically rigorous performance evaluation.
 
=== Thesis  ===
 
*https://wiki.mozilla.org/images/c/c0/Larres-thesis.pdf, a mirror of http://majutsushi.net/stuff/thesis.pdf

Latest revision as of 22:33, 6 December 2013

Signal From Noise

Making sense of the Talos results

Specifc goals 2013Q2

Overview

Historically we have had an 'acceptable' range of fluctuation in our talos number. Our methods of managing and tracking the numbers have all been surrounding running a test multiple times and generating a single number that we can track over time. This is great for long term tracking, but when looking at what that number represents and why it fluctuates there is a lot of room for error.

We want to do a better job of generating our 1 tracking number. We also want to revisit the way we are testing things and make sure we are running the right tests and the correct number of iterations to get a reliable data point. Most likely this involves looking at every page that we have and tracking that page individually, not as a small piece of a larger set of pages.

Most of this project is outlined well at on the [Talos Investigation] page.

Goals

Original Goals:

  • define what is signal and what is noise
  • understand the distribution of numbers and have confidence that our representation is meaningful

Appended goals:

  • ensure our tests are testing the right things
  • return performance regression results per push and posted to tbpl in real time
  • provide tools to quickly investigate performance regressions, compare changesets/branches, and identify trends

Drivers

  • Datazilla: jeads
  • Talos: jhammel, jmaher
  • Metrics: christina (as needed)

Meetings

Meetings are every other Thursday at 11AM Pacific Time in Joel Maher's Vidyo room.

  • The Datazilla project is holding Datazilla focus group meetings with interested developers to judge our progress toward fixing use cases that developers and tree sheriffs care about.

Status

Bugs

UI (Datazilla)

Use Cases

Currently:

  • Firefox developer:
    • push patch to mozilla-central, expect green talos results
      • all results are green unless test fails to complete, then it is red
    • notification to dev.tree-management indicates a regression
      • developer goes to graphs-new and looks at the (test, platform, branch) graph
      • maybe compares to other platforms or branches
  • Talos developer
    • adds new feature to talos with expected change in numbers
    • run change side by side as a new test name for 1 week
    • browse to graphs-new to view new_test vs old_test to look at raw data points over a few days on each platform

Proposed 1 (assuming 1% deviation):

  • Firefox developer
    • push patch to mozilla-central, expect green talos results
      • number outside of 2% from gold standard, run turns orange
      • orange run has link on tbpl to graph server
      • graph server has a quick line of historical data and other platforms
      • then a focused section of what the gold standard is and what that run produced
      • it would be nice to see what the previous 5 runs had in terms of numbers, as well as all other platforms
    • no need for notification mails to dev.tree-management since this is managed in tbpl
    • FLAW: if firefox adjusts the standard number (up or down), then how do we call it the new standard?
      • maybe the web interface can have a way to change the number on the fly and put a bug/comment for the adjustment
  • Talos developer
    • adds new feature to talos with expected change in numbers
    • while pushing add an entry to the graph server of the new expected number
    • no need for side by side since we are just comparing to a known standard number

Data

Graph Server: Current:

  • data from tests (tp5 sample)
    • noisy output on console (coming from pageloader)
NOISE: |i|pagename|median|mean|min|max|runs|
NOISE: |0;thesartorialist.blogspot.com;852;864.3333333333334;809;1135;951;852;920;865;809;858;851;1135;836;837
NOISE: |1;cakewrecks.blogspot.com;264;266.55555555555554;252;651;651;263;252;273;264;275;260;292;268;252
    • data sent to the graph server
0,852.00,thesartorialist.blogspot.com
1,264.00,cakewrecks.blogspot.com

Right now we are sending the median value (without the highest value in the set) to the graph server for each page. On the graph server, we [calculate our metric] for tp5 by averaging all the uploaded median values except for the max value.

  • TODO* define the perf counters that we collect and upload.

Datazilla:

  • volume - TODO
  • storage - TODO
  • format - TODO

Links

Thesis