Auto-tools/Projects/Signal From Noise

Signal From Noise

Making sense of the Talos results

Overview

Historically we have had an 'acceptable' range of fluctuation in our talos number. Our methods of managing and tracking the numbers have all been surrounding running a test multiple times and generating a single number that we can track over time. This is great for long term tracking, but when looking at what that number represents and why it fluctuates there is a lot of room for error.

We want to do a better job of generating our 1 tracking number. We also want to revisit the way we are testing things and make sure we are running the right tests and the correct number of iterations to get a reliable data point. Most likely this involves looking at every page that we have and tracking that page individually, not as a small piece of a larger set of pages.

Background

Most of this project is outlined well at on the [Talos Investigation] page.

Action Items

The Goal by March is:

  • Have the tools (pageloader, talos, graphserver) retooled so we can research new tests and run tests in a more reliable fashion
  • Implement and roll out tdhtml using the new toolchain
  • Have a process in place for adding new tests and pagesets into the tool set

Milestone 1

  • discard the first iteration of a page load
  • add options to pageloader for alternative page loading and measurements
  • add options to talos configuration to support new pageloader requirements
  • create a v1 of the dhtml test using new methodology
  • work with rhelmer and jeads to start discussion of what data we want
    • samples of work that :slewchuk did, mixed with inital data from dhtml results
  • Initial version of database requirements to host new data
  • Blog frequently about progress and goals

Milestone 2

  • Validate tdhtml data with metrics
  • Generate single 'metric' to track tdhtml as we currently do
  • Ensure core database and input methods for data are deployed
  • Start rolling out on branches with side by side staging
  • Beta version of UI live for inital data from the branches
  • Start investigating tsvg and a11y for optimal sampling sizes and accuracy
  • Continue to blog and post to newsgroups

Milestone 3

  • Continue rolling out tdhtml to other branches
  • Enhance tools like compare-talos and regression-finder to work with new tdhtml
  • Write analysis toolchain for investigating new tests and pages (i.e. the work we do on tsvg and a11y should be automated)
  • Integrate analysis toolchain into existing tools as much as possible
  • Version 1.0 of the new UI should be available. Multiple views on the same data as well as drill down from given data point or time window

Milestone 3.14 (bonus work if all goes well)

  • Define requirements for a Version 2.0 of the new UI
  • start rolling out tsvg and a11y
  • start investigating tp5 (or maybe it is time for tp6 and we start there)
  • enhance compare talos toolchain to show differences from a try server run to the baseline (easier talos development as well as firefox development)

Related Work

We need to be considerate of other projects and try to coordinate as much as possible.

  • mozbase
    • we will be fixing up talos to use mozprocess, mozprofile, mozrunner. This doesn't intersect with SfN work, but if we are doing a large staging run this would be beneficial to bundle together. staging
  • mozharness
    • again, no impact on this project. staging,SxS
  • python 2.4->2.6+
    • no real impact on this project. staging,SxS
  • jetpack talos
    • most likely some changes to talos, primarily focused on ts, maybe some graphserver work required
  • AMO maintenance
    • no impact on this project
  • OSX RSS from pageloader
    • small talos and config tweaks for tp5. staging,SxS

Possible Reshuffling

Most of the other work requires staging and side by side (SxS) running to ensure we don't fudge the numbers.

  • Can our toolchain make the side by side easier and less painful?

We won't be modifying talos proper much which means that the work in these other projects shouldn't affect SfN.

  • Will we be comfortable doubling our work in staging and SxS?