Confirmed users
3,376
edits
(Created page with "= Overview = The sheriff team does a great job of finding regressions in unittests and getting fixes for them or backing stuff out. This keeps our trees green and usable whil...") |
(→Finding the root cause: - more data) |
||
| Line 20: | Line 20: | ||
= Finding the root cause = | = Finding the root cause = | ||
There are many reasons for an alert and different scenarios to be aware of: | |||
* backout (usually within 1 week causing a similar regression/improvement) | |||
* pgo/nonpgo (some errors are pgo only and might be a side effect of pgo). We only ship PGO, so these are the most important. | |||
* test/infrastructure change - once in a while we change big things about our tests or infrastructure and it affects our tests | |||
* Coalesed - this is when we don't run every job on every platform on every push and sometimes we have a set of changes | |||
* Regular regression - the normal case where we get an alert and we see it merge from branch to branch | |||
== Backout == | |||
Backouts happen every day, but backouts that generate performance regressions are what add noise to the system. | |||
Here is an example of a backout which affected many tests. | |||
[[AlertManager link http://alertmanager.allizom.org:8080/alerts.html?rev=d4c4897f9ffb]] [[related coalesced http://alertmanager.allizom.org:8080/alerts.html?rev=5cc96a763c3f]] | |||
This example is interesting because we see one change which was quickly identified as the correct change, but one job was coalesced. The coalescing is easy to detect because looking at the suspected [[http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=f4eea7e2f94b&tochange=5cc96a763c3f changeset]] it is a range. That range includes our backed out changeset as well as the graph showing the backout pattern. Adding more to it, this is on Windows 8 which is the platform which showed a regression on the backout. We have high confidence to map this coalesced alert as being the root cause of the backout. | |||
= Verifying an alert = | = Verifying an alert = | ||