Auto-tools/Projects/Stockwell/Ideas
From MozillaWiki
< Auto-tools | Projects | Stockwell
- Actions which might reduce OF
- on-going triage of frequent intermittent failures
- check if test recently modified
- find regressing changeset
- find an owner
- identify and call-out platform correlations, interesting logs, or other info that might encourage resolution of the bug
- make it easier to find regressions
- longer ActiveData history
- fail-per-run stats
- OF tracks failures and pushes but not successful test runs
- ActiveData provides failures-per-test-run, but lacks long-term history
- find long-term neglected intermittent failures
- identify on-going bugs that don't exceed the 30/week threshold but contribute to high OF over time
- publicize best practices for tests
- race reduction strategy
- develop and publicize more best practices
- lint rules?
- logging/diagnostics to make it easier to recognize and debug races in tests
- enable eslint on more test files - bug 1357557
- test verification - bug 1357513
- test verification of changed files on checkin
- test verification for mach
- test verification for backfill
- easier backouts
- backout button on treeherder
- make intermittent fail count a searchable parameter in bugzilla
- ideally, it would be nice to search for failures in a range of dates, but simple 1-day and 7-day counts would be valuable
- ideally, actual failure rate (failures/run) would be nice too
- component intermittent triage dashboard
- provide a tool for component owners to find intermittent failures requiring attention
- ownership clarity for meta-bugs
- who is responsible for intermittent failures on browser startup and shutdown, leaks, hangs, and other failures not closely associated with an identifiable subset of tests?
- strategy for meta-bugs
- ...since they usually cannot be resolved by skipping any single test
- avoid failures from long-running tests/jobs
- monitor test/job durations to identify those that are approaching timeout thresholds
- autoclassification
- to reduce impact of intermittents
- run fewer tests
- could we reduce redundancy, consolidate tests?
- do we need to run everything on all platforms?
- quarantine new/modified tests
- put intermittently failing tests in "purgatory"
- evolve triage/skipping policy
- https://groups.google.com/forum/#!topic/mozilla.dev.platform/EeN78p_2PGw
- need to get frequently-failing tests disabled or fixed ASAP
- track disabled test counts
- ideally by component/owner
- ...to check that we haven't disabled the world for intermittent failures
- improve one-click loaner
- ...easier to reproduce/fix failures
- get more people fixing bugs
- get more resources?
- policy/nudges?
- reduce infrastructure failures
- ...somehow, since they often have big impact
- reproduce failures in rr
- provide automatic feedback for try pushes:
- did my try push cause unexpected failures? ...an autoclassify issue?
- on-going triage of frequent intermittent failures
- Completed
- Define intermittent failure, classification of high priority bugs
- Updated "OrangeFactor robot" bug comments to emphasize high priority bugs
- Update bugzilla component info in .mozbuild for all/most test files
- Created Neglected Oranges tool
- Created 'mach test-info'
- Easy retriggering with more diagnostics -- bug 1322433
- Quick-and-dirty study of classes of causes of intermittent failures