Goal 2013 Q2
Replace graph server functionality with Datazilla for all desktop Talos tests
Breakdown and End Game
There are a bunch of moving parts here. The end game is still to turn Talos jobs orange on TBPL when a regression is detected per push. However, due to TBPL architecture, we may be gated on Treeherder (TBPL v2) in order to actually do that. However, we can still replace the graph server work flow (and enhance and improve that workflow) with datazilla even without TBPL v2 being deployed. That is what we aim to do this quarter for our desktop talos tests.
Approach and Milestones
- TODO: Add bugzilla bug queries to this by attaching proper whiteboard flags to the relevant bugs that are filed.
Milestone 1 - 2 weeks (ending april 12)
- [Bug List: ]
- [DONE] Create email list for results from datazilla
- onhold - waiting on ekyle availability - Wire emailing to that ^ list into the talos emailer (analyze_talos.py) now
- [DONE] Create schema and data service for generating the relevant time course data for comparisons of:
- Same page perf to same page perf over pushes and/or time (identifying noise in page)
- Same page perf to same page perf on different platforms over pushes and/or time (platform differences)
- Page set perf to page set perf on same platform over pushes and/or time (what graph server now shows)
- Page set perf to page set perf on different platforms over pushes and/or time (what graph server now shows with some poking)
- Data set ingestion rates per test per platform (ensuring we are getting the data)
- Page and pageset perf per machine per platform (identifying specific machine anomalies)
- [MISSED] Begin analyzing data to optimize statistics for regression detection on each desktop Talos test (we are pretty good for tp5o, we need to look into Ts, Tsvg* and others)
- Understand if we need per-page optimizations as well for pages within pagesets (we likely do).
- [DONE] Analyze why we seem to not get data at times from certain platform/tree combinations.
Milestone 2 - 2 weeks May 2 - May 16
[Bug List: ]
- [MISSED] Develop and Deploy UI for time course data
- Deploy changes to statistical calculations based on the analysis from test data
- [DONE] Fix issues w.r.t. why we are not getting data from certain platform/tree combinations
- Fix all desktop tests so that they are reporting to Datazilla
Milestone 3 - 2 weeks May 20
[Bug List: ]
- [ON TRACK] (depends on ui for time course) Back fill schema with data for all projects and deploy to production
- [ON TRACK] deploy time course onto production
- [DROPPED] Work out how to get try reporting to Datazilla by using mozilla-central as a base for the pushlog (this gives us an 80% usecase)
- [DROPPED] Deploy try integration as a data source.
- [AT RISK] Begin analyzing data to optimize statistics for regression detection on each desktop Talos test (we are pretty good for tp5o, we need to look into Ts, Tsvg* and others
Milestone 4 - 2 weeks (planned)
[Bug List: ]
- Analyze the data and automatic regression emails coming in and verify that:
- They are tracking graph server's/analyze_talos.py's regression emails (datazilla should be more sensitive than that method)
- They are not alerting on false positives
- Use tools for sheriffing talos regressions, tweak as needed (find/fix)
- Continue analyzing the regressions coming in and the tools for sheriffing the talos regressions, tweak as needed. (find/fix)