Auto-tools/Projects/Signal From Noise: Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
No edit summary
Line 1: Line 1:
= Signal From Noise =
= Signal From Noise =


Making sense of the Talos results
Making sense of the Talos results  


This is a joint project among the A-team, Releng, Webdev, and Metrics.
This is a joint project among the A-team, Releng, Webdev, and Metrics.  


== Overview ==
== Overview ==


Historically we have had an 'acceptable' range of fluctuation in our talos number. Our methods of managing and tracking the numbers have all been surrounding running a test multiple times and generating a single number that we can track over time. This is great for long term tracking, but when looking at what that number represents and why it fluctuates there is a lot of room for error.
Historically we have had an 'acceptable' range of fluctuation in our talos number. Our methods of managing and tracking the numbers have all been surrounding running a test multiple times and generating a single number that we can track over time. This is great for long term tracking, but when looking at what that number represents and why it fluctuates there is a lot of room for error.  


We want to do a better job of generating our 1 tracking number. We also want to revisit the way we are testing things and make sure we are running the right tests and the correct number of iterations to get a reliable data point. Most likely this involves looking at every page that we have and tracking that page individually, not as a small piece of a larger set of pages.
We want to do a better job of generating our 1 tracking number. We also want to revisit the way we are testing things and make sure we are running the right tests and the correct number of iterations to get a reliable data point. Most likely this involves looking at every page that we have and tracking that page individually, not as a small piece of a larger set of pages.  


=== Goals ===
=== Goals ===


* define what is signal and what is noise
*define what is signal and what is noise  
* understand the distribution of numbers and have confidence that our representation is meaningful
*understand the distribution of numbers and have confidence that our representation is meaningful


=== Bugs ===
=== Bugs ===


Signal from Noise bugs are marked with the [https://bugzilla.mozilla.org/buglist.cgi?esolution=---&status_whiteboard_type=allwordssubstr&query_format=advanced&status_whiteboard=%5bSfN%5d SfN whiteboard entry]
Signal from Noise bugs are marked with the [https://bugzilla.mozilla.org/buglist.cgi?esolution=---&status_whiteboard_type=allwordssubstr&query_format=advanced&status_whiteboard=%5bSfN%5d SfN whiteboard entry]  


== Drivers ==
== Drivers ==


* side by side staging : jmaher
*side by side staging : jmaher  
* graphserver : jeads + BYK
*graphserver : jeads + BYK  
* pageloader and other tools : jhammel
*pageloader and other tools : jhammel


== Background ==
== Background ==


Most of this project is outlined well at on the [[https://wiki.mozilla.org/Metrics/Talos_Investigation Talos Investigation]] page.
Most of this project is outlined well at on the [[https://wiki.mozilla.org/Metrics/Talos_Investigation Talos Investigation]] page.  


== Meetings ==
== Meetings ==
Meetings are every [http://www.timeanddate.com/worldclock/fixedtime.html?iso=20120308T11&p1=900 Thursday at 11AM Pacific Time].


[https://wiki.mozilla.org/Auto-tools/Projects/Signal_From_Noise/Meetings Here is our meeting page - take a look for more details and notes from previous meetings.]
Meetings are every [http://www.timeanddate.com/worldclock/fixedtime.html?iso=20120308T11&p1=900 Thursday at 11AM Pacific Time].


=== Datazilla Meetings ===
[https://wiki.mozilla.org/Auto-tools/Projects/Signal_From_Noise/Meetings Here is our meeting page - take a look for more details and notes from previous meetings.]
The Datazilla project is holding focus group meetings with interested developers to judge our progress toward fixing use cases that developers and tree sheriffs care about.


See our [[Auto-tools/Projects/Datazilla/Meetings|Datazilla Meeting Page]] for information and notes from those.
=== Datazilla Meetings ===


== Action Items ==
The Datazilla project is holding focus group meetings with interested developers to judge our progress toward fixing use cases that developers and tree sheriffs care about.


The Goal by March is:
See our [[Auto-tools/Projects/Datazilla/Meetings|Datazilla Meeting Page]] for information and notes from those.
* Have the tools (pageloader, talos, graphserver) retooled so we can research new tests and run tests in a more reliable fashion
* Implement and roll out tdhtml using the new toolchain
* Have a process in place for adding new tests and pagesets into the tool set


There are general estimates of time throughout here, these are just placeholders. While the development time might be 2 hours, there are 2 days budgeted for this.  Basically this accounts for time to develop, test and document your patch.  Time for the reviewer to review and any back and forth.  Finally time for staging and coordinating a deployment of a new talos.zip.
== Action Items ==


All in all there are 82 estimated work days to achieve success this next quarter.  These 82 days do not include core development of the graph server, but it does include us meeting, reviewing, and helping the UI folks with the graph server.
The Goal by March is:


=== Milestone 1 ===
*Have the tools (pageloader, talos, graphserver) retooled so we can research new tests and run tests in a more reliable fashion
25 work days
*Implement and roll out tdhtml using the new toolchain
*Have a process in place for adding new tests and pagesets into the tool set


* discard the first iteration of a page load (2 days to get landed, then SxS for a rollout - good practice) (jhammel - it would be nice to know *why* the difference)
There are general estimates of time throughout here, these are just placeholders. While the development time might be 2 hours, there are 2 days budgeted for this. Basically this accounts for time to develop, test and document your patch. Time for the reviewer to review and any back and forth. Finally time for staging and coordinating a deployment of a new talos.zip.  
* add options to pageloader for alternative page loading and measurements. make it more flexible as to how to load pages (the order, etc) 1 week
* add options to talos configuration to support new pageloader requirements.  2 days (jhammel - I wouldn't mind taking this)
* create a v1 of the dhtml test using new methodology. 1 week
* work with rhelmer and jeads to start discussion of what data we want. 2 weeks
** samples of work that :slewchuk did, mixed with inital data from dhtml results
* Initial version of database requirements to host new data. 3 days
* Blog frequently about progress and goals. get the word out. get feedback. cultivate knowledge


=== Milestone 2 ===
All in all there are 82 estimated work days to achieve success this next quarter. These 82 days do not include core development of the graph server, but it does include us meeting, reviewing, and helping the UI folks with the graph server.
20 days of work


* Validate tdhtml data with metrics. 2 days
=== Milestone 1 ===
* Generate single 'metric' to track tdhtml as we currently do. 2 days
* Ensure core database and input methods for data are deployed. 2 days
* Start rolling out on branches with side by side staging. 4 days
* Beta version of UI live for inital data from the branches. 1 week
* Start investigating tsvg and a11y for optimal sampling sizes and accuracy. 1 week
* Continue to blog and post to newsgroups. n/a


=== Milestone 3 ===
25 work days  
24 work days


* Continue rolling out tdhtml to other branches. 4 days
*discard the first iteration of a page load (2 days to get landed, then SxS for a rollout - good practice) (jhammel - it would be nice to know *why* the difference)
* Enhance tools like compare-talos and regression-finder to work with new tdhtml. 1 week
*add options to pageloader for alternative page loading and measurements. make it more flexible as to how to load pages (the order, etc) 1 week  
* Write analysis toolchain for investigating new tests and pages (i.e. the work we do on tsvg and a11y should be automated). 1 week
*add options to talos configuration to support new pageloader requirements. 2 days (jhammel - I wouldn't mind taking this)
* Integrate analysis toolchain into existing tools as much as possible. 1 week
*create a v1 of the dhtml test using new methodology. 1 week  
* Version 1.0 of the new UI should be available. Multiple views on the same data as well as drill down from given data point or time window. 1 week
*work with rhelmer and jeads to start discussion of what data we want. 2 weeks
**samples of work that :slewchuk did, mixed with inital data from dhtml results
*Initial version of database requirements to host new data. 3 days
*Blog frequently about progress and goals. get the word out. get feedback. cultivate knowledge


=== Milestone 3.14 (bonus work if all goes well) ===  
=== Milestone ===
13 work days


* Define requirements for a Version 2.0 of the new UI. 2 days
20 days of work
* start rolling out tsvg and a11y. 3 days
* start investigating tp5 (or maybe it is time for tp6 and we start there). 1 week.
* enhance compare talos toolchain to show differences from a try server run to the baseline (easier talos development as well as firefox development). 3 days


== Related Work ==
*Validate tdhtml data with metrics. 2 days
*Generate single 'metric' to track tdhtml as we currently do. 2 days
*Ensure core database and input methods for data are deployed. 2 days
*Start rolling out on branches with side by side staging. 4 days
*Beta version of UI live for inital data from the branches. 1 week
*Start investigating tsvg and a11y for optimal sampling sizes and accuracy. 1 week
*Continue to blog and post to newsgroups. n/a


We need to be considerate of other projects and try to coordinate as much as possible.
=== Milestone 3  ===


* mozbase
24 work days
** we will be fixing up talos to use mozprocess, mozprofile, mozrunner.  This doesn't intersect with SfN work, but if we are doing a large staging run this would be beneficial to bundle together. staging
* mozharness
** again, no impact on this project. staging,SxS
* python 2.4->2.6+
** no real impact on this project. staging,SxS
* jetpack talos
** most likely some changes to talos, primarily focused on ts, maybe some graphserver work required
* AMO maintenance
** no impact on this project
* OSX RSS from pageloader
** small talos and config tweaks for tp5.  staging,SxS


== Possible Reshuffling ==
*Continue rolling out tdhtml to other branches. 4 days
*Enhance tools like compare-talos and regression-finder to work with new tdhtml. 1 week
*Write analysis toolchain for investigating new tests and pages (i.e. the work we do on tsvg and a11y should be automated). 1 week
*Integrate analysis toolchain into existing tools as much as possible. 1 week
*Version 1.0 of the new UI should be available. Multiple views on the same data as well as drill down from given data point or time window. 1 week


Most of the other work requires staging and side by side (SxS) running to ensure we don't fudge the numbers.
=== Milestone 3.14 (bonus work if all goes well) ===
* Can our toolchain make the side by side easier and less painful? (jhammel - this would be a good thing to blog about)


We won't be modifying talos proper much which means that the work in these other projects shouldn't affect SfN. 
13 work days
* Will we be comfortable doubling our work in staging and SxS? (jhammel - we should probably peg more carefully to versions of mozbase software)


== Contacts ==
*Define requirements for a Version 2.0 of the new UI. 2 days
* ateam:  BYK, jeads, jhammel, jmaher
*start rolling out tsvg and a11y. 3 days
* metrics: christina
*start investigating tp5 (or maybe it is time for tp6 and we start there). 1 week.
* releng:  armenzg ??
*enhance compare talos toolchain to show differences from a try server run to the baseline (easier talos development as well as firefox development). 3 days
* webdev:  rhelmer


== UI Prototype ==
== Related Work  ==


The prototype user interface can be reached on Mozilla-MPT by adding:
We need to be considerate of other projects and try to coordinate as much as possible.


10.8.73.31      datazilla
*mozbase
**we will be fixing up talos to use mozprocess, mozprofile, mozrunner. This doesn't intersect with SfN work, but if we are doing a large staging run this would be beneficial to bundle together. staging
*mozharness
**again, no impact on this project. staging,SxS
*python 2.4->2.6+
**no real impact on this project. staging,SxS
*jetpack talos
**most likely some changes to talos, primarily focused on ts, maybe some graphserver work required
*AMO maintenance
**no impact on this project
*OSX RSS from pageloader
**small talos and config tweaks for tp5. staging,SxS


To your /etc/hosts file and then directing your browser to:
== Possible Reshuffling  ==


/datazilla/views
Most of the other work requires staging and side by side (SxS) running to ensure we don't fudge the numbers.


The source code for datazilla can be found at https://github.com/jeads/datazilla
*Can our toolchain make the side by side easier and less painful? (jhammel - this would be a good thing to blog about)


=== Mockups ===
We won't be modifying talos proper much which means that the work in these other projects shouldn't affect SfN.  
A set of user interface mockups can be found here [[Media:TalosSignalFromNoiseMocks.pdf]].  This document presents a collection of ideas for extending the graphs-new interface to manage different types of data with multiple visualization strategies.


=== Use Cases ===
*Will we be comfortable doubling our work in staging and SxS? (jhammel - we should probably peg more carefully to versions of mozbase software)


Currently:
== Contacts  ==
* Firefox developer:
** push patch to mozilla-central, expect green talos results
*** all results are green unless test fails to complete, then it is red
** notification to dev.tree-management indicates a regression
*** developer goes to graphs-new and looks at the (test, platform, branch) graph
*** maybe compares to other platforms or branches


* Talos developer
*ateam: BYK, jeads, jhammel, jmaher
** adds new feature to talos with expected change in numbers
*metrics: christina
** run change side by side as a new test name for 1 week
*releng: armenzg ??
** browse to graphs-new to view new_test vs old_test to look at raw data points over a few days on each platform
*webdev: rhelmer


Proposed 1 (assuming 1% deviation):
== UI Prototype  ==
* Firefox developer
** push patch to mozilla-central, expect green talos results
*** number outside of 2% from gold standard, run turns orange
*** orange run has link on tbpl to graph server
*** graph server has a quick line of historical data and other platforms
*** then a focused section of what the gold standard is and what that run produced
*** it would be nice to see what the previous 5 runs had in terms of numbers, as well as all other platforms
** no need for notification mails to dev.tree-management since this is managed in tbpl
** FLAW: if firefox adjusts the standard number (up or down), then how do we call it the new standard?
*** maybe the web interface can have a way to change the number on the fly and put a bug/comment for the adjustment


* Talos developer
The prototype user interface can be reached on Mozilla-MPT by adding:
** adds new feature to talos with expected change in numbers
** while pushing add an entry to the graph server of the new expected number
** no need for side by side since we are just comparing to a known standard number


=== Data ===
10.8.73.31 datazilla
 
To your /etc/hosts file and then directing your browser to:
 
/datazilla/views
 
The source code for datazilla can be found at https://github.com/jeads/datazilla
 
=== Mockups  ===
 
A set of user interface mockups can be found here [[Media:TalosSignalFromNoiseMocks.pdf]]. This document presents a collection of ideas for extending the graphs-new interface to manage different types of data with multiple visualization strategies.
 
=== Use Cases  ===
 
Currently:
 
*Firefox developer:
**push patch to mozilla-central, expect green talos results
***all results are green unless test fails to complete, then it is red
**notification to dev.tree-management indicates a regression
***developer goes to graphs-new and looks at the (test, platform, branch) graph
***maybe compares to other platforms or branches
 
*Talos developer
**adds new feature to talos with expected change in numbers
**run change side by side as a new test name for 1 week
**browse to graphs-new to view new_test vs old_test to look at raw data points over a few days on each platform
 
Proposed 1 (assuming 1% deviation):
 
*Firefox developer
**push patch to mozilla-central, expect green talos results
***number outside of 2% from gold standard, run turns orange
***orange run has link on tbpl to graph server
***graph server has a quick line of historical data and other platforms
***then a focused section of what the gold standard is and what that run produced
***it would be nice to see what the previous 5 runs had in terms of numbers, as well as all other platforms
**no need for notification mails to dev.tree-management since this is managed in tbpl
**FLAW: if firefox adjusts the standard number (up or down), then how do we call it the new standard?
***maybe the web interface can have a way to change the number on the fly and put a bug/comment for the adjustment
 
*Talos developer
**adds new feature to talos with expected change in numbers
**while pushing add an entry to the graph server of the new expected number
**no need for side by side since we are just comparing to a known standard number
 
=== Data ===
 
Current:
 
*data from tests (tp5 sample)
**noisy output on test console (coming from pageloader)


Current:
* data from tests (tp5 sample)
** noisy output on test console (coming from pageloader)
  NOISE: |i|pagename|median|mean|min|max|runs|
  NOISE: |i|pagename|median|mean|min|max|runs|
  NOISE: |0;thesartorialist.blogspot.com;852;864.3333333333334;809;1135;951;852;920;865;809;858;851;1135;836;837
  NOISE: |0;thesartorialist.blogspot.com;852;864.3333333333334;809;1135;951;852;920;865;809;858;851;1135;836;837
  NOISE: |1;cakewrecks.blogspot.com;264;266.55555555555554;252;651;651;263;252;273;264;275;260;292;268;252
  NOISE: |1;cakewrecks.blogspot.com;264;266.55555555555554;252;651;651;263;252;273;264;275;260;292;268;252


** data sent to the graph server
**data sent to the graph server
 
  0,852.00,thesartorialist.blogspot.com
  0,852.00,thesartorialist.blogspot.com
  1,264.00,cakewrecks.blogspot.com
  1,264.00,cakewrecks.blogspot.com


Right now we are sending the median value (without the highest value in the set) to the graph server for each page. On the graph server, we [[http://hg.mozilla.org/graphs/file/1237d38a299b/server/pyfomatic/collect.py#l208 calculate our metric]] for tp5 by averaging all the uploaded median values except for the max value.
Right now we are sending the median value (without the highest value in the set) to the graph server for each page. On the graph server, we [[http://hg.mozilla.org/graphs/file/1237d38a299b/server/pyfomatic/collect.py#l208 calculate our metric]] for tp5 by averaging all the uploaded median values except for the max value.  


*TODO* define the perf counters that we collect and upload.
*TODO* define the perf counters that we collect and upload.


* how it is stored
*how it is stored  
* volume
*volume
 
Proposed
 
*data from tests
*how it is stored
*volume
 
[[Auto-tools/Projects/Signal From Noise/JSON Ingestion]]
 
== Data  ==


Proposed
*http://people.mozilla.org/~jmaher/sfn/column/
* data from tests
*http://people.mozilla.org/~jmaher/sfn/row/
* how it is stored
* volume


[[JSON Ingestion]]
== Links  ==


== Data ==
*https://wiki.mozilla.org/Metrics/Talos_Investigation
* http://people.mozilla.org/~jmaher/sfn/column/
*http://shawnwilsher.com/archives/tag/regression
* http://people.mozilla.org/~jmaher/sfn/row/
*https://groups.google.com/forum/#!topic/mozilla.dev.planning/nxR6tcDmZWQ
*The script to pull the talos data out of elastic search: https://github.com/salamand/ESTalosPull
*The Log harvester that helped pull logs directly from pulse: https://github.com/salamand/PulseLogHarvester
*https://wiki.mozilla.org/Perfomatic#Architecture
*{{bug|706912}}
*where regression emails go: http://groups.google.com/group/mozilla.dev.tree-management/topics?lnk=srg&pli=1
*http://people.mozilla.org/~jmaher/sxs/sxs.html
*http://k0s.org/mozilla/blog/20120131164249
*https://etherpad.mozilla.org/graphserver-next


== Links ==
=== Thesis  ===
* https://wiki.mozilla.org/Metrics/Talos_Investigation
* http://shawnwilsher.com/archives/tag/regression
* https://groups.google.com/forum/#!topic/mozilla.dev.planning/nxR6tcDmZWQ
* The script to pull the talos data out of elastic search: https://github.com/salamand/ESTalosPull
* The Log harvester that helped pull logs directly from pulse: https://github.com/salamand/PulseLogHarvester
* https://wiki.mozilla.org/Perfomatic#Architecture
* {{bug|706912}}
* where regression emails go: http://groups.google.com/group/mozilla.dev.tree-management/topics?lnk=srg&pli=1
* http://people.mozilla.org/~jmaher/sxs/sxs.html
* http://k0s.org/mozilla/blog/20120131164249
* https://etherpad.mozilla.org/graphserver-next


=== Thesis ===
*https://wiki.mozilla.org/images/c/c0/Larres-thesis.pdf, a mirror of http://majutsushi.net/stuff/thesis.pdf
* https://wiki.mozilla.org/images/c/c0/Larres-thesis.pdf, a mirror of http://majutsushi.net/stuff/thesis.pdf
48

edits

Navigation menu