Confirmed users
3,376
edits
m (→What is an alert: - june 2016 update) |
m (→Investigating the alert: - June 2016 update) |
||
| Line 34: | Line 34: | ||
This is a manual process that needs to be done for every alert. We need to: | This is a manual process that needs to be done for every alert. We need to: | ||
* Look at the graph and determine the original branch, date, revision where the alert occurred | * Look at the graph and determine the original branch, date, revision where the alert occurred | ||
* Look at TreeHerder and determine if we have all the data. | * Look at TreeHerder and determine if we have all the data. | ||
* Retrigger jobs if needed (more [[https://wiki.mozilla.org/Buildbot/Talos/Sheriffing/Noise_FAQ#What_is_Noise noise]], more retriggers) | * Retrigger jobs if needed (more [[https://wiki.mozilla.org/Buildbot/Talos/Sheriffing/Noise_FAQ#What_is_Noise noise]], more retriggers) | ||
* Once you have more data, look at the data in [https://treeherder.mozilla.org/perf.html#/comparechooser compare view] to see if other tests/platforms have changed | |||
* Add all related alerts you see to the summary with the reassign button | |||
== Determining the root cause from the | == Determining the root cause from the Perfherder == | ||
When viewing a single alert and clicking on the graph link, Perfherder automatically show multiple branches for the given test/platform. This helps you determine the root branch. It is best to [[https://wiki.mozilla.org/Buildbot/Talos/Sheriffing/GraphServer_FAQ#Zooming zoom]] in and out to verify where the regression is. | |||
While this isn't always clear, most of the time it is easy to see another alert on a different branch and mark the current one as a downstream if needed. | |||
In rare cases we do not generate an alert on the original branch and then we would want to manually create an alert, then mark the first alert you were looking at as downstream to the new alert. | |||
== Determining if we have all the data from | == Determining if we have all the data from Treeherder == | ||
Since an alert is a suggestion of the original changeset, I always open the graph view, zoom in to a narrow window, then open the test job (from the link shown when clicking on a data point) of a job in the future. Then I filter the treeherder view down and show the next 10 jobs. This gives you a range of pushes to see coalescing, retriggers, and allows you to fill in the holes by retriggering and scheduling jobs. | |||
Here we are looking for a few things: | |||
* Do we have data for the revision before / after the revision we have identified as regressing? If not, we should consider filling in the missing data. | * Do we have data for the revision before / after the revision we have identified as regressing? If not, we should consider filling in the missing data. | ||
* Is our revision or the revision before / after a merge? If so, we should retrigger to ensure that we are not investigating a merged changeset, if we are on a merged changeset, we need to go to the original branch and bisect. | * Is our revision or the revision before / after a merge? If so, we should retrigger to ensure that we are not investigating a merged changeset, if we are on a merged changeset, we need to go to the original branch and bisect. | ||
| Line 67: | Line 70: | ||
This is important because we then have enough evidence to show that the regression is sustained through retriggers and over time. If there is suspect of alerts on other tests/platforms, please retriggers as well. | This is important because we then have enough evidence to show that the regression is sustained through retriggers and over time. If there is suspect of alerts on other tests/platforms, please retriggers as well. | ||
== Determining the scope of the regression from Perfherder == | |||
Once you have the spot, you can validate the other platforms by [[https://wiki.mozilla.org/Buildbot/Talos/Sheriffing/GraphServer_FAQ#Adding_additional_Data_Points adding additional data sets]] to the graph. It is best here to zoom out a bit as the regression might be a few revisions off on different platforms due to [[https://wiki.mozilla.org/Buildbot/Talos/Sheriffing/Tree_FAQ#What_is_coalescing coalescing]]. | |||
== Cases to watch out for == | == Cases to watch out for == | ||
| Line 72: | Line 78: | ||
* [[https://wiki.mozilla.org/Buildbot/Talos/Sheriffing/Tree_FAQ#What_is_a_backout backout]] (usually within 1 week causing a similar regression/improvement) | * [[https://wiki.mozilla.org/Buildbot/Talos/Sheriffing/Tree_FAQ#What_is_a_backout backout]] (usually within 1 week causing a similar regression/improvement) | ||
* [[https://wiki.mozilla.org/Buildbot/Talos/Sheriffing/Tree_FAQ#What_is_PGO pgo/nonpgo]] (some errors are pgo only and might be a side effect of pgo). We only ship PGO, so these are the most important. | * [[https://wiki.mozilla.org/Buildbot/Talos/Sheriffing/Tree_FAQ#What_is_PGO pgo/nonpgo]] (some errors are pgo only and might be a side effect of pgo). We only ship PGO, so these are the most important. | ||
* test/infrastructure change - once in a while we change big things about our tests or infrastructure and it affects our tests | * test/infrastructure change - once in a while we change big things about our tests or infrastructure and it affects our tests (we need bugs to document these those) | ||
* [[https://wiki.mozilla.org/Buildbot/Talos/Sheriffing/Tree_FAQ#What_is_a_merge Merged]] - sometimes the root cause looks to be a merge, this is a normall a side effect of [[https://wiki.mozilla.org/Buildbot/Talos/Sheriffing/Tree_FAQ#What_is_coalescing Coalescing]]. | * [[https://wiki.mozilla.org/Buildbot/Talos/Sheriffing/Tree_FAQ#What_is_a_merge Merged]] - sometimes the root cause looks to be a merge, this is a normall a side effect of [[https://wiki.mozilla.org/Buildbot/Talos/Sheriffing/Tree_FAQ#What_is_coalescing Coalescing]]. | ||
* [[https://wiki.mozilla.org/Buildbot/Talos/Sheriffing/Tree_FAQ#What_is_coalescing Coalesed]] - this is when we don't run every job on every platform on every push and sometimes we have a set of changes | * [[https://wiki.mozilla.org/Buildbot/Talos/Sheriffing/Tree_FAQ#What_is_coalescing Coalesed]] - this is when we don't run every job on every platform on every push and sometimes we have a set of changes | ||