160
edits
(mention old page) |
(update links) |
||
| Line 5: | Line 5: | ||
= What is an alert = | = What is an alert = | ||
As of January 2016, alerts are generated in [https://treeherder.mozilla.org/perf.html#/alerts?status=0&framework=1 Perfherder]. These are generated by programatically verifying there is a sustained regression over time ([https://wiki.mozilla.org/ | As of January 2016, alerts are generated in [https://treeherder.mozilla.org/perf.html#/alerts?status=0&framework=1 Perfherder]. These are generated by programatically verifying there is a sustained regression over time ([https://wiki.mozilla.org/Performance_sheriffing/Talos/Sheriffing/Noise_FAQ#Why_do_we_need_12_future_data_pointsoriginal data point + 12 future data points]). | ||
There is an alert summary outlining the alerts which match the same set of revisions. For the summary there are a few pieces of information: | There is an alert summary outlining the alerts which match the same set of revisions. For the summary there are a few pieces of information: | ||
* Title (which is a good bug title if filing one for a regression: | * Title (which is a good bug title if filing one for a regression: | ||
** [https://wiki.mozilla.org/ | ** [https://wiki.mozilla.org/Performance_sheriffing/Talos/Sheriffing/Tree_FAQ#Branch_names_and_confusion branch] | ||
** % regressed, this is a range of the regressions (not improvements) | ** % regressed, this is a range of the regressions (not improvements) | ||
** the [https://wiki.mozilla.org/ | ** the [https://wiki.mozilla.org/Performance_sheriffing/Talos/Tests tests] which have regressed | ||
** the platforms we see this regression on | ** the platforms we see this regression on | ||
* date of the suspect revision push | * date of the suspect revision push | ||
| Line 19: | Line 19: | ||
Below the summary will be a list of alerts, each alert will reference: | Below the summary will be a list of alerts, each alert will reference: | ||
* [https://wiki.mozilla.org/ | * [https://wiki.mozilla.org/Performance_sheriffing/Talos/Tests Test name] | ||
* platform (including build type, such as opt, pgo) | * platform (including build type, such as opt, pgo) | ||
* old score (median score of the previous 12 commits) | * old score (median score of the previous 12 commits) | ||
* new score (median score of the future 12 commits) | * new score (median score of the future 12 commits) | ||
* [https://wiki.mozilla.org/ | * [https://wiki.mozilla.org/Performance_sheriffing/Talos/Sheriffing/Alert_FAQ#Why_does_Alert_Manager_print_-xx.25 % change / values] | ||
* bar chart to show severity, green = improvement, red = regression | * bar chart to show severity, green = improvement, red = regression | ||
* Confidence value (from the t-test code) | * Confidence value (from the t-test code) | ||
| Line 33: | Line 33: | ||
* Look at the graph and determine the original branch, date, revision where the alert occurred | * Look at the graph and determine the original branch, date, revision where the alert occurred | ||
* Look at Treeherder and determine if we have all the data. | * Look at Treeherder and determine if we have all the data. | ||
* Retrigger jobs if needed (more [https://wiki.mozilla.org/ | * Retrigger jobs if needed (more [https://wiki.mozilla.org/Performance_sheriffing/Talos/Sheriffing/Noise_FAQ#What_is_Noise noise], more retriggers) | ||
* Once you have more data, look at the data in [https://treeherder.mozilla.org/perf.html#/comparechooser compare view] to see if other tests/platforms have changed | * Once you have more data, look at the data in [https://treeherder.mozilla.org/perf.html#/comparechooser compare view] to see if other tests/platforms have changed | ||
* Add all related alerts you see to the summary with the reassign button | * Add all related alerts you see to the summary with the reassign button | ||
== Determining the root cause from Perfherder == | == Determining the root cause from Perfherder == | ||
When viewing a single alert and clicking on the graph link, Perfherder automatically show multiple branches for the given test/platform. This helps you determine the root branch. It is best to [https://wiki.mozilla.org/ | When viewing a single alert and clicking on the graph link, Perfherder automatically show multiple branches for the given test/platform. This helps you determine the root branch. It is best to [https://wiki.mozilla.org/Performance_sheriffing/Talos/Sheriffing/Perfherder_FAQ#Zooming zoom] in and out to verify where the regression is. | ||
While this isn't always clear, most of the time it is easy to see another alert on a different branch and mark the current one as a downstream if needed. | While this isn't always clear, most of the time it is easy to see another alert on a different branch and mark the current one as a downstream if needed. | ||
| Line 72: | Line 72: | ||
== Determining the scope of the regression from Perfherder == | == Determining the scope of the regression from Perfherder == | ||
Once you have the spot, you can validate the other platforms by [https://wiki.mozilla.org/ | Once you have the spot, you can validate the other platforms by [https://wiki.mozilla.org/Performance_sheriffing/Talos/Sheriffing/Perfherder_FAQ#Adding_additional_data_points adding additional data sets] to the graph. It is best here to zoom out a bit as the regression might be a few revisions off on different platforms due to [https://wiki.mozilla.org/Performance_sheriffing/Talos/Sheriffing/Tree_FAQ#What_is_coalescing coalescing]. | ||
== Cases to watch out for == | == Cases to watch out for == | ||
There are many reasons for an alert and different scenarios to be aware of: | There are many reasons for an alert and different scenarios to be aware of: | ||
* [https://wiki.mozilla.org/ | * [https://wiki.mozilla.org/Performance_sheriffing/Talos/Sheriffing/Tree_FAQ#What_is_a_backout backout] (usually within 1 week causing a similar regression/improvement) | ||
* [https://wiki.mozilla.org/ | * [https://wiki.mozilla.org/Performance_sheriffing/Talos/Sheriffing/Tree_FAQ#What_is_PGO pgo/nonpgo] (some errors are pgo only and might be a side effect of pgo). We only ship PGO, so these are the most important. | ||
* test/infrastructure change - once in a while we change big things about our tests or infrastructure and it affects our tests (we need bugs to document these those) | * test/infrastructure change - once in a while we change big things about our tests or infrastructure and it affects our tests (we need bugs to document these those) | ||
* [https://wiki.mozilla.org/ | * [https://wiki.mozilla.org/Performance_sheriffing/Talos/Sheriffing/Tree_FAQ#What_is_a_merge Merged] - sometimes the root cause looks to be a merge, this is a normall a side effect of [https://wiki.mozilla.org/Performance_sheriffing/Talos/Sheriffing/Tree_FAQ#What_is_coalescing Coalescing]. | ||
* [https://wiki.mozilla.org/ | * [https://wiki.mozilla.org/Performance_sheriffing/Talos/Sheriffing/Tree_FAQ#What_is_coalescing Coalesed] - this is when we don't run every job on every platform on every push and sometimes we have a set of changes | ||
* Regular regression - the normal case where we get an alert and we see it merge from branch to branch | * Regular regression - the normal case where we get an alert and we see it merge from branch to branch | ||
| Line 86: | Line 86: | ||
Every release of Firefox we create a tracking bug (i.e. {{bug|1386631}} - Firefox 57) which we use to associate all regressions found during that release. The reason for this is 2 fold: | Every release of Firefox we create a tracking bug (i.e. {{bug|1386631}} - Firefox 57) which we use to associate all regressions found during that release. The reason for this is 2 fold: | ||
* We can go to one spot and see what regressions we have for reference on new bugs or to follow up. | * We can go to one spot and see what regressions we have for reference on new bugs or to follow up. | ||
* When we [https://wiki.mozilla.org/ | * When we [https://wiki.mozilla.org/Performance_sheriffing/Talos/Sheriffing/Tree_FAQ#What_is_an_uplift uplift] it is important to see which alerts we are expecting | ||
These bugs just contain a set of links to other bugs, no conversation is needed. | These bugs just contain a set of links to other bugs, no conversation is needed. | ||
| Line 97: | Line 97: | ||
Here are some things to check/verify when filing a bug: | Here are some things to check/verify when filing a bug: | ||
* Product/Component - this should be the same as the bug which is the root cause, if >1 bug, file in [https://bugzilla.mozilla.org/enter_bug.cgi?product=Testing&component=Talos Talos] | * Product/Component - this should be the same as the bug which is the root cause, if >1 bug, file in [https://bugzilla.mozilla.org/enter_bug.cgi?product=Testing&component=Talos Talos] | ||
* Dependent/Block bugs - For a new bug, add the [https://wiki.mozilla.org/ | * Dependent/Block bugs - For a new bug, add the [https://wiki.mozilla.org/Performance_sheriffing/Talos/Sheriffing#Tracking_bugs tracking bug] (for the current version) and root cause bug(s) as blocking this bug | ||
* CC list - cc patch author(s), reviewer(s) and owner of the tests as documented on the [https://wiki.mozilla.org/ | * CC list - cc patch author(s), reviewer(s) and owner of the tests as documented on the [https://wiki.mozilla.org/Performance_sheriffing/Talos/Tests Talos tests wiki]; if we have >1 bug, we should cc everyone who worked on those bugs so we call pitch in an answer questions faster | ||
* Summary of bug should have a check to make sure the revision is accurate | * Summary of bug should have a check to make sure the revision is accurate | ||
* The description is auto suggested as well, please verify the revision here | * The description is auto suggested as well, please verify the revision here | ||
As a note, the generated description refers the patch author to [https://wiki.mozilla.org/ | As a note, the generated description refers the patch author to [https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling guidelines and expectations] for them about how and when to respond. | ||
Once a bug is filed it is a good idea to do a few things in another comment: | Once a bug is filed it is a good idea to do a few things in another comment: | ||
| Line 113: | Line 113: | ||
== Merge Day - Uplifts == | == Merge Day - Uplifts == | ||
Every 6 weeks we do an [https://wiki.mozilla.org/ | Every 6 weeks we do an [https://wiki.mozilla.org/Performance_sheriffing/Talos/Sheriffing/Tree_FAQ#What_is_an_uplift uplift]. These typically result in [https://elvis314.wordpress.com/2014/12/12/tracking-firefox-performance-as-we-uplift-the-volume-of-alerts-we-get/ dozens of alerts] for each uplift. | ||
The job here is to triage alerts as we usually do, except in this case we have a much larger volume of alerts. One thing here is we have alerts from the upstream branch. Take for example when we uplift Mozilla-Central to Mozilla-Beta. We have a tracking bug for each release, and there is a list of bugs (keep in mind some are resolved as wontfix). In a perfect world (half the time) we can match up the alerts that are showing up on Mozilla-Beta with the bugs that have already been filed. The job here is to verify and add bugs to keep track of what is there. | The job here is to triage alerts as we usually do, except in this case we have a much larger volume of alerts. One thing here is we have alerts from the upstream branch. Take for example when we uplift Mozilla-Central to Mozilla-Beta. We have a tracking bug for each release, and there is a list of bugs (keep in mind some are resolved as wontfix). In a perfect world (half the time) we can match up the alerts that are showing up on Mozilla-Beta with the bugs that have already been filed. The job here is to verify and add bugs to keep track of what is there. | ||
| Line 125: | Line 125: | ||
= Additional Resources = | = Additional Resources = | ||
* [https://wiki.mozilla.org/ | * [https://wiki.mozilla.org/Performance_sheriffing/Talos/Sheriffing/Alert_FAQ Alert FAQ] | ||
* [https://wiki.mozilla.org/ | * [https://wiki.mozilla.org/Performance_sheriffing/Talos/Sheriffing/Noise_FAQ Noise FAQ] | ||
* [https://wiki.mozilla.org/ | * [https://wiki.mozilla.org/Performance_sheriffing/Talos/Sheriffing/Perfherder_FAQ Perfherder FAQ] | ||
* [https://wiki.mozilla.org/ | * [https://wiki.mozilla.org/Performance_sheriffing/Talos/Sheriffing/Tree_FAQ Tree FAQ] | ||
* [https://wiki.mozilla.org/ | * [https://wiki.mozilla.org/Performance_sheriffing/Talos/Sheriffing duplicated & updated from old page] | ||
edits