Auto-tools/Projects/Datazilla/Meetings/2012-05-08

From MozillaWiki
Jump to: navigation, search

These are raw notes off the etherpad

agenda

  • Quick Intros - who you are what you work on
    • Datazilla team
      • Jonathan - jeads (working on datazilla)
      • Cameron - camd (working on datazilla)
      • carl - carljm (working on datazilla)
    • Joel - jmaher (works on talos, leads signal from noise project)
    • Ed - edmorley (works on automating sheriff tools, brings sheriffing use cases to the new system)
    • Marco - mak (works on browser front end)
    • Justin - jlebar (works on all kinds of back end performance stuff for platform, b2g, mobile etc)
    • Clint - ctalbert (coordinator and note-taker).
  • Meeting logistics - can we meet every other week at this time?
    • Do this every two weeks, but we will likely change the vidyo room to ctalbert's

Overview of current site - 15mins

  • Found many problems with tracking only the mean as we do currently in talos.
    • Obscures the actual problems with regressions eetc.
    • Our original goal was to capture all the raw data - including the various raw data from actual loads of test pages (talos loads each page x times, we want to capture all those times instead of just the top level mean as we do on graphs.m.o today)
    • jeads does demo of how the current drill-down works from mean/stddeviation overview all the way down to individual page load measurements.
      • The idea was to give the drill-down to identify what page regressed a "tp5 number".
        • It would make more sense to take a mean for each page in the pageset rather than taking a mean across all the pages in one pageset. This way you could rapidly identify which page in a page set is demonstrating a regression during a talos run.

A few TODO Items (planned but not implemented yet)

  • Ability to select different time ranges
  • Zoom levels on axes
  • Show case same amount of funcitonality that current graph server has

Collections

  • Can put together a set of data panels to address a particular use case - it's essentially a way to tie different queries into the database together in different ways. This way we can build a "set" that illustrates a way to carve into the data.
  • Should be a way we can easily extend the existing UI to address new use cases

Use Cases Discussions

  • Would like to combine trees - or create collection to pull certain trees together (like inbound)
  • Try server runs should be a separate thing - they have to be a heat map or something because you won't have a graph - you have no history on try server data. And you have to figure out how to average the data points.
    • We don't have the tools to filter the signal from the noise if we blow up the page set to a 100 different pages - ctalbert notes that we actually do have the Metrics team working with us on figuring out how to identify these regrssions on a per page basis - that's one of our goals for Q2
  • Is the X-axis always time? Or can we change it to changesets and equally space out changesets. After you push a few changsets you want to see variation over changesets instead of over time (the use-case is getting a disposable branch, and pushing changesets at different days, the timeframe has no value here, only changesets have, but measures may be really packed up or spaced creating bogus looking graphs, hard to discern)
    • That's a bug with the current graph server - because it orders its data by time the build finishes - if it were ordered by changesets then it would make more sense (newer builds sometimes finish before older builds).

3 important use cases:

  • Compare Talos Use Case - it's completely broken right now and needs to be fixed
  • Current graph server does - what it does has issues, and its existing bugs need to be fixed (jlebar will post those bugs into this etherpad)
  • Integration with TBPL - if yo upush to try the tbpl letters should turn orange if the tests regress.

Other TBPL integration issues:

  • being able to search by changeset - that would be incredibly useful and is something we're missing from the current system.
  • It would be nice to have links from TBPL that took you to graphserver and highlighted the exact point of your change.
  • The actual point will be "yes/no/maybe" which is what makes this hard - It could be that you need to run more tests in order to properly identify a true regression - might need a repeat run on try to get additional data points.
  • We could also implement TBPL navigation on the graphserver for the tests (use the same letter abbreviations for instance) - lower priority
  • [edmorley] In order to get usable numbers out of compare-talos, one has to manually retrigger a number of talos runs on the try run and it's control/reference changeset. There is currently no way to see from the compare-talos UI how many of these retriggers have been performed (or have completed vs still pending), and as such, how reliable the % regressions being shown, may be - so you have to keep on switching back to TBPL to see the builds/pending.

There is a build API (eg https://build.mozilla.org/buildapi/self-serve). that can be used to perform the retriggers.

    • do we want to require retriggers? Is there a suggested amount we would want to requre?

Feedback and Use Cases - the rest of the time

Compare talos -it's really hard to find the old history of the builds so that you can see how your current push compares to the previous 10 changesets - that should be automatic and not something that people have to do manually. - ability to look at a configurable range of changesets back in time
** Specifically - I push to try - that try push has a base version on some public tree. And logically what we want to do is whetehr my new changes affect the benchmark. So yo uwant to compare the results from my push to the results from that base revision in mozilla-central. But that base revision only has one round of tests - so you need to go back to the base revision of the base revision so that you can see the old data from the past 20 parent revisions of the base revision of my "try" push -(which is always hard to figure out )

Separate tools that communicate that each do a well-defined job. We want to keep the small tools that do a good job together. Allow forr plugability so that other tools can plug in here (also create the separate entry points for other people to extend) Current tools that we have - graphserver.m.o and comparetalos are not well maintained. There are old bugs in them and known difficult design problems. And these are just two out of ten that are being proposed. His concern is that we're spreading quite thin, with tools that we may not actually need, and we need to make the tools we have do what we need. We actually attempted to use the existing graph server and not re-write it. But when we tried to use the current graph system to track the raw values, we couldn't do it in that system. And we started this project. We also found that all our variance in talos tests came from page loader tests - tp5, tdhtml, tsvg, etc. And so that lead us to this. The thinking here is more that it would be more valuable to get one part of the graph server system right, and do that first.

jlebar's list of current graphserver bugs

  • Points are sorted by time, not topological order (bug 688534)
  • Can't go to (and highlight) specific changeset (bug 659302)
  • Out of hand list of trees/platforms/tests. (bug 696202)
  • No TBPL integration (e.g. bug 736872)
  • Can't easily change start/end of date range (bug 659305)
  • URL contains invalid chars, breaks bugzilla linkification (bug 672139)
  • Color and z-order of lines is non-deterministic, so you can't say "look at the blue line" (bug 667911)
  • Doesn't notice when you modify URL (bug 659291)
  • But I've stopped filing :( <-- the good news is, there's plenty of room to iterate.  :)
  • What is people's thoughts on the main entry point? The current entry point is getting the email on dev.treemanagement and going in that way.
  • We could keep the existing email system, but one thing you can do with the curent overview page is good to see across all the pages.
  • Emails are now being sent to patch author and pusher as well. But we need better detection and better remedy for false positives.
  • I think we need to think about the entry point.
  • Entry on TBPL to a speciifc entry point.

Some prior art: http://areweslimyet.com