Signal From Noise

Making sense of the Talos results

Overview

Historically we have had an 'acceptable' range of fluctuation in our talos number. Our methods of managing and tracking the numbers have all been surrounding running a test multiple times and generating a single number that we can track over time. This is great for long term tracking, but when looking at what that number represents and why it fluctuates there is a lot of room for error.

We want to do a better job of generating our 1 tracking number. We also want to revisit the way we are testing things and make sure we are running the right tests and the correct number of iterations to get a reliable data point. Most likely this involves looking at every page that we have and tracking that page individually, not as a small piece of a larger set of pages.

Background

Most of this project is outlined well at on the [Talos Investigation] page.

Action Items

The Goal by March is:

Have the tools (pageloader, talos, graphserver) retooled so we can research new tests and run tests in a more reliable fashion
Implement and roll out tdhtml using the new toolchain
Have a process in place for adding new tests and pagesets into the tool set

Milestone 1

discard the first iteration of a page load
add options to pageloader for alternative page loading and measurements
add options to talos configuration to support new pageloader requirements
create a v1 of the dhtml test using new methodology
work with rhelmer and jeads to start discussion of what data we want
- samples of work that :slewchuk did, mixed with inital data from dhtml results
Initial version of database requirements to host new data
Blog frequently about progress and goals

Milestone 2

Validate tdhtml data with metrics
Generate single 'metric' to track tdhtml as we currently do
Ensure core database and input methods for data are deployed
Start rolling out on branches with side by side staging
Beta version of UI live for inital data from the branches
Start investigating tsvg and a11y for optimal sampling sizes and accuracy
Continue to blog and post to newsgroups

Milestone 3

Continue rolling out tdhtml to other branches
Enhance tools like compare-talos and regression-finder to work with new tdhtml
Write analysis toolchain for investigating new tests and pages (i.e. the work we do on tsvg and a11y should be automated)
Integrate analysis toolchain into existing tools as much as possible
Version 1.0 of the new UI should be available. Multiple views on the same data as well as drill down from given data point or time window

Milestone 3.14 (bonus work if all goes well)

Define requirements for a Version 2.0 of the new UI
start rolling out tsvg and a11y
start investigating tp5 (or maybe it is time for tp6 and we start there)
enhance compare talos toolchain to show differences from a try server run to the baseline (easier talos development as well as firefox development)

Related Work

We need to be considerate of other projects and try to coordinate as much as possible.

mozbase
- we will be fixing up talos to use mozprocess, mozprofile, mozrunner. This doesn't intersect with SfN work, but if we are doing a large staging run this would be beneficial to bundle together. staging
mozharness
- again, no impact on this project. staging,SxS
python 2.4->2.6+
- no real impact on this project. staging,SxS
jetpack talos
- most likely some changes to talos, primarily focused on ts, maybe some graphserver work required
AMO maintenance
- no impact on this project
OSX RSS from pageloader
- small talos and config tweaks for tp5. staging,SxS

Possible Reshuffling

Most of the other work requires staging and side by side (SxS) running to ensure we don't fudge the numbers.

Can our toolchain make the side by side easier and less painful?

We won't be modifying talos proper much which means that the work in these other projects shouldn't affect SfN.

Will we be comfortable doubling our work in staging and SxS?

Auto-tools/Projects/Signal From Noise

Contents