This page summarizes the work that the A-Team has put into addressing developer workflow/efficiency concerns in 2015Q2, as presented at the Whistler all-hands, with some additional projects that we won't have time to review. We also describe some cool new work coming up in the latter half of 2015.
During the second half of 2015, we will focus on reducing the noise and improving the signal in the way we handle intermittents. This will result in notifications being meaningful and actionable, and the overall drag on developer efficiency will be reduced.
The various pieces of this are:
- Turn off bugmail for intermittents, and instead, periodically post a summary of bug stats. The period will probably vary depending on the frequency of failure.
- Treeherder will start tracking the rate of intermittent test failures, and bugs will only be filed for intermittents that reach a certain threshold. Lower-incidence intermittents will never make it into Bugzilla, but will still be viewable in e.g., OrangeFactor.
- Failing jobs on try runs will be automatically retriggered, within certain thresholds. This will allow developers to bypass the current time-consuming manual retrigger-and-wait cycles they currently often use to help determine if failed jobs are intermittent or not.
- We will implement auto starring of oranges in Treeherder. This will initially likely apply to simple job failures (those that don't cause a cascade of other failures), with edge cases being handled later. This will reduce the sheriff workload and make interpreting try runs much easier. It will also enable other cool things, like smarter automatic retriggering on try runs and trunk branches, and provide data that can be used to seed automatic bisection bots.
- Experiment with bots to vet new tests and changes to tests (to verify that these don't introduce intermittents) and to bisect new, frequent intermittents.
Link to Whistler presentation: http://slides.com/jgriffin/deck#/
Tree Closures for a number of the integration branches are currently on the rise. We can see this on Futurama. Some of the current tree closures can be attributed to the length that tests take to run (Normalising test run length would solve this).
For one of the months, the amount of tree closures have been attributed to the following
|Tree Closed for a Test or Bustage||Count|
|Failing Test - No Try run done||7|
|Try breadth of run not sufficient||3|
|Bustage - No Try Run||5|
|Bustage - Try breadth of run not sufficient||3|
Slides available at http://oss.theautomatedtester.co.uk/Presentations/dev-sanity/#/
Running tests from tests.zip
A new mach bootstrap environment was created specifically for running in the context of a tests.zip. This will allow us to write mach commands that will get packaged as part of the package-tests make target. Developers downloading tests.zip (or packaging it from a local build) can now run |mach help| to perform test package related tasks (most likely running tests without building locally).
Currently only the mochitest mach command has been implemented, but other harnesses may follow suit shortly depending on resources and demand. Because there is no |mozbuild| module to work with, the mach command cannot automatically detect the firefox binary, it must be passed in via --appname. For the same reason, the mach command cannot resolve tests and automatically detect which flavors and subsuites exist in a particular directory like the in-tree version of the script does.
Usage: mach mochitest --appname path/to/firefox [path/to/test/dir/or/file]
Test selection on try
Select tests within a test job to run on try. This is achieved by extending try syntax to support paths and tags as arguments. This means shorter turn around times for getting feedback on particular tests and less redundant machine time running tests that probably aren't going to fail as a result of a particular change.
This is primarily exposed through a new mach command, |mach try|, that takes paths and/or tags as arguments, calculates an appropriate try syntax based on the tests present in the tree, and pushes the result to try. The implementation was discussed in 1149670. Details can be found by running |mach help try|.
Automatic retrigger of oranges on try
Retrigger failed jobs on try automatically. This is a pulse service that listens for failed test jobs and triggers each one two additional times. The idea is to reduce time spent determining whether a particular failure on try is an intermittent issue or a persistent failure introduced by a specific change. A common response to orange jobs on try is to retrigger them to determine if the failure persists, and although it results in more test jobs on try, the service is intended to save developer time by replacing this manual process with an automated one.
A related feature implemented by this service is the ability to automatically build every test job on a push a certain number of times. This is a long-requested feature intended to be used by those investigating or proving a fix for an intermittent failures as well as those establishing a baseline for comparing Talos results across revisions.
The rollout of the service is ongoing and tracked by 1163698 (details on the implementation can be found there as well). As of this writing it's enabled for try pushes initiated by the autoland user.
Allow scheduling missing platforms on a try push
When a developer pushes to try, they have to select a list of platforms in their commit message. If they don't specify the right set of platforms, they would need to push again to try wasting machine resources and time. They now can use Mozilla CI tools (aka mozci) in order to trigger missing platforms.
In the future, an improved experience will be achieve through "Case scenario 8: Developer needs to add missing platforms/jobs for a Try push"
Unified interface for running tests
Every test harness has its own set of arguments and quirks. It's also not entirely obvious which suite has tests for feature X. It requires some guess work to figure out how to test a particular feature thoroughly. We are slowly moving away from a "suite centric" approach to a "feature centric" one. The end goal is for a developer to say which part of the code they want to test, and for the automation to figure out what it needs to do to accomplish that. Developers shouldn't have to know or care that it's necessary to run mochitest chrome with --subsuite push to test a given component.
This is already partly implemented by the |mach mochitest| command that will figure out all the flavors and subsuites that exist in a given directory, and by the |mach test| command that will find which test harnesses have tests in the given directory. There is still a lot of work to do here though. |mach test| doesn't support every suite and it doesn't work with b2g. The output from running multiple suites one after another doesn't get summarized nicely and |mach test| doesn't support the full set of command line options that you get when running the various suites directly (this is mostly due to the fact that the harnesses don't share the same names for their cli arguments). The end goal is most developers using |mach test path/to/dir| to run their tests, no matter the suite, flavor or platform.