Auto-tools/Projects/Stockwell/Ideas

From MozillaWiki

< Auto-tools‎ | Projects‎ | Stockwell

Jump to: navigation, search

Actions which might reduce OF
- on-going triage of frequent intermittent failures
  - check if test recently modified
  - find regressing changeset
  - find an owner
  - identify and call-out platform correlations, interesting logs, or other info that might encourage resolution of the bug
- make it easier to find regressions
- longer ActiveData history
- fail-per-run stats
  - OF tracks failures and pushes but not successful test runs
  - ActiveData provides failures-per-test-run, but lacks long-term history
- find long-term neglected intermittent failures
  - identify on-going bugs that don't exceed the 30/week threshold but contribute to high OF over time
- publicize best practices for tests
  - https://developer.mozilla.org/en-US/docs/Mozilla/QA/Avoiding_intermittent_oranges
- race reduction strategy
  - develop and publicize more best practices
  - lint rules?
  - logging/diagnostics to make it easier to recognize and debug races in tests
- enable eslint on more test files - bug 1357557
- test verification - bug 1357513
  - test verification of changed files on checkin
  - test verification for mach
  - test verification for backfill
- easier backouts
  - backout button on treeherder
- make intermittent fail count a searchable parameter in bugzilla
  - ideally, it would be nice to search for failures in a range of dates, but simple 1-day and 7-day counts would be valuable
  - ideally, actual failure rate (failures/run) would be nice too
- component intermittent triage dashboard
  - provide a tool for component owners to find intermittent failures requiring attention
- ownership clarity for meta-bugs
  - who is responsible for intermittent failures on browser startup and shutdown, leaks, hangs, and other failures not closely associated with an identifiable subset of tests?
- strategy for meta-bugs
  - ...since they usually cannot be resolved by skipping any single test
- avoid failures from long-running tests/jobs
  - monitor test/job durations to identify those that are approaching timeout thresholds
- autoclassification
  - to reduce impact of intermittents
- run fewer tests
  - could we reduce redundancy, consolidate tests?
  - do we need to run everything on all platforms?
- quarantine new/modified tests
- put intermittently failing tests in "purgatory"
- evolve triage/skipping policy
  - https://groups.google.com/forum/#!topic/mozilla.dev.platform/EeN78p_2PGw
  - need to get frequently-failing tests disabled or fixed ASAP
- track disabled test counts
  - ideally by component/owner
  - ...to check that we haven't disabled the world for intermittent failures
- improve one-click loaner
  - ...easier to reproduce/fix failures
- get more people fixing bugs
  - get more resources?
  - policy/nudges?
- reduce infrastructure failures
  - ...somehow, since they often have big impact
- reproduce failures in rr
- provide automatic feedback for try pushes:
  - did my try push cause unexpected failures? ...an autoclassify issue?

Completed
- Define intermittent failure, classification of high priority bugs
- Updated "OrangeFactor robot" bug comments to emphasize high priority bugs
- Update bugzilla component info in .mozbuild for all/most test files
- Created Neglected Oranges tool
- Created 'mach test-info'
- Easy retriggering with more diagnostics -- bug 1322433
- Quick-and-dirty study of classes of causes of intermittent failures
  - https://wiki.mozilla.org/EngineeringProductivity/Projects/Stockwell/Meetings/2017-05-09

Retrieved from "https://wiki.mozilla.org/index.php?title=Auto-tools/Projects/Stockwell/Ideas&oldid=1174567"