From MozillaWiki
Jump to: navigation, search

Our process for managing and addressing intermittent failures of WebExtensions tests.

Identifying important intermittents

Thanks to the stockwell project, any intermittent that is deemed as needing our attention will be flagged with a whiteboard tag with the word stockwell in it. The three flags that are of interest to us are:

  • stockwell needswork
    • These are failures that have occurred more than x times within a 7 day period, with x currently being 30. A member of our team will be needinfo’d on the bug when this whiteboard tag is added, and we should endeavour to triage and assign the bug to someone.
    • Bugzilla query
  • stockwell disable-recommended
    • These are bugs which were in a stockwell needswork status for a given period of time (usually 2 weeks) but which have not been fixed. The test will be disabled soon, so these should also be addressed ASAP.
    • Bugzilla query
  • stockwell disabled
    • These are tests that have been disabled as a result of being too-frequent intermittents. Ideally we should investigate these to address the intermittency and re-enable the test, thereby regaining test coverage, but these are a lower priority than the other two mentioned above.
    • Bugzilla query

Triaging Intermittents

As described above, any bugs with stockwell disable-recommended in the whiteboard should be treated as a top priority in order to not lose test coverage due to the test being disabled. Bugs with stockwell needswork in the whiteboard should be treated as the next level of priority, and bugs with stockwell disabled in the whiteboard can be treated with the lowest level of priority.

Ideally, any bugs that appear in our triage list with either stockwell disable-recommended or stockwell needswork in the whiteboard should be assigned a priority of at least 3 and be assigned to someone.

Re-enabling Tests

Someone on the team should periodically review the bugs with stockwell disabled in the whiteboard to determine which ones, if any, should be given priority for investigation and fixing. The following criteria should be considered:

  • Any tests that are disabled on all platforms should be given top priority.
  • Those should be followed by tests that are disabled on Windows.
  • Any tests disabled only on non-Windows platforms should be given the lowest priority.


To judge how well we are doing with managing intermittents, we can look at the Orange Factor, which is the number of failures divided by the number of pushes over a specific period of time. We can generate this information from using a query similar to this.