Triaging Stockwell related intermittent failures is important to the success of Stockwell and the health of the tree. An eventual goal of Stockwell is to have all triage handled by component owners, not Stockwell team. Currently we have many teams triaging their own intermittent failures and have found that the fix rate is not very high compared to outsiders highlighting the issues.
On this page I want to outline the process of Triage and how to be effective. Please take a few minutes to read the flow chart outlining the data and decisions related to triage:
There are a few important things when it comes to Triage:
- Your goal is to make the bug actionable
- Narrowing down the problem (when it started, where it fails) helps make the bug actionable
- Finding the right person- Either existing assignee, or frequent comment provider- we do have triage owner for all bugs
- Following up regularly so there are no surprises
- Recognizing similar failures in other bugs and relating them helps fix many problems at once
The focus on Triage is to gather information and find the right person to work on the bug. Disabling tests, testing fixes locally or try server, fixing bugs, or retrigging are out of the scope of Triage (although all of those tasks are very useful).