Auto-tools/Projects/Stockwell
Contents
Overview
- Stockwell comes from A*Team characters: https://en.wikipedia.org/wiki/List_of_The_A-Team_characters
Mozilla has a problem with intermittent failures which are commonly known as "Oranges". These occur at an ever increasing rate and over the years many tools have been build and worked on to make managing the volume of these intermittent failures easier. Stockwell will provide a series of changes to the engineering workflow at Mozilla to have a longer term sustainable solution to these failures. Effectively we will turn intermittent failures into actionable items where all developers feel empowered and responsible for making things better.
Team
Meetings
Meetings are fortnightly on Tuesday 8:30am PDT
Information and notes are on the meeting wiki
Problem
Goal of the project
Reduce our intermittent failures when you push to <5 per push.
We will build tools, processes, and relationships to change the way we view testing and automation. When failures occur, we need to understand how realistic it is to fix the issue and give tools to developers to fix them if possible. If it is not realistic to fix the issue, we need to reduce the visibility and keep track of it in case it becomes severe and needs more serious time invested in it.
Non Goals
We will not be fixing every intermittent, nor will we be disabling all tests.
Dependencies / Who will use this
- sheriffs - they typically identify new intermittents
- autostar - will be automatically categorizing intermittents
- developers - will be fixing intermittents
- testduty - new role (short term/long term) to triage and keep orange factor down
Design / Approach
- See a running list of ideas and sub-projects at https://wiki.mozilla.org/Auto-tools/Projects/Stockwell/Ideas
Milestones and Dates
- December 10th, 2016 - deliver plan to Mozilla Developers including plan for Q1, metrics to track
- Q1, 2017 - implement plan, continue experiments
- Q2, 2017 - repeat Q1, deliver report and 2017 Q3/Q4 plan in San Francisco to Mozilla developers
Triage
Triage is one of the first things we did and it is still require to be successful. Triage will change in scope over time as we adjust processes, expectations, tools, and robots.
- Orange Factor Robot - comments on bugs, sets whiteboard tags for optimizing triage
- Triage workflow and notes
- FAQ start here with questions
- Gathering Data - tips for finding useful data
- Auto-tools/Projects/Stockwell/disable-recommended - procedure for handling disable-recommended bugs
- Auto-tools/Projects/Stockwell/backfill-retrigger - procedure for finding patch that caused intermittent
- Auto-tools/Projects/Stockwell/test-verify - run a single test over multiple revisions
Getting Involved
We don't have a clear list of bugs, but when we do, they will show up here. If we determine we have mentored bugs, they will be easily discoverable here as well.
6 Total; 6 Open (100%); 0 Resolved (0%); 0 Verified (0%);