Confirmed users
2,197
edits
Alexandrui (talk | contribs) |
No edit summary |
||
| Line 1: | Line 1: | ||
= | = Perfherder = | ||
== What is Perfherder == | |||
[https://treeherder.mozilla.org/perf.html#/graphs Perfherder] is a tool that takes data points from log files and graphs them over time. Primarily this is used for performance data from [[TestEngineering/Performance/Talos|Talos]], but also from [[AWSY/Tests|AWSY]], build_metrics, [[EngineeringProductivity/Autophone|Autophone]] and platform_microbenchmarks. All these are test harnesses and you can find more about them [[TestEngineering/Performance/Sheriffing/Alerts|here]]. | [https://treeherder.mozilla.org/perf.html#/graphs Perfherder] is a tool that takes data points from log files and graphs them over time. Primarily this is used for performance data from [[TestEngineering/Performance/Talos|Talos]], but also from [[AWSY/Tests|AWSY]], build_metrics, [[EngineeringProductivity/Autophone|Autophone]] and platform_microbenchmarks. All these are test harnesses and you can find more about them [[TestEngineering/Performance/Sheriffing/Alerts|here]]. | ||
The code for Perfherder can be found inside Treeherder [https://github.com/mozilla/treeherder/ here]. | The code for Perfherder can be found inside Treeherder [https://github.com/mozilla/treeherder/ here]. | ||
== Viewing details on a graph == | |||
When viewing Perfherder Graph details, in many cases it is obvious where the regression is. If you mouse over the data points (not click on them) you can see some raw data values. | When viewing Perfherder Graph details, in many cases it is obvious where the regression is. If you mouse over the data points (not click on them) you can see some raw data values. | ||
| Line 22: | Line 21: | ||
Keep in mind, graph server doesn't show if there is missing data or a range of changesets. | Keep in mind, graph server doesn't show if there is missing data or a range of changesets. | ||
== Zooming == | |||
Perfherder graphs has the ability adjust the date range from a drop down box. We default to 14 days, but we can change it to last day/2/7/14/30/90/365 days from the UI drop down. | Perfherder graphs has the ability adjust the date range from a drop down box. We default to 14 days, but we can change it to last day/2/7/14/30/90/365 days from the UI drop down. | ||
| Line 33: | Line 32: | ||
[[File:Ph_Zooming.jpg]] | [[File:Ph_Zooming.jpg]] | ||
== Adding additional data points == | |||
One feature of Perfherder graphs is the ability to add up to 7 sets of data points at once and compare them on the same graph. In fact when clicking on a graph for an alert, we do this automatically when we add multiple branches at once. | One feature of Perfherder graphs is the ability to add up to 7 sets of data points at once and compare them on the same graph. In fact when clicking on a graph for an alert, we do this automatically when we add multiple branches at once. | ||
| Line 53: | Line 52: | ||
[[File:Ph_Addmoredata.jpg]] | [[File:Ph_Addmoredata.jpg]] | ||
== Muting additional data points == | |||
Once you become familiar with graph server it is a common use case to have [[TestEngineering/Performance/Sheriffing/Perfherder_FAQ#Adding_additional_data_points|multiple data points]] on the graph at a time. This results in a lot of confusing data points if you are trying to zoom in and investigating the values for a given data point. | Once you become familiar with graph server it is a common use case to have [[TestEngineering/Performance/Sheriffing/Perfherder_FAQ#Adding_additional_data_points|multiple data points]] on the graph at a time. This results in a lot of confusing data points if you are trying to zoom in and investigating the values for a given data point. | ||
| Line 64: | Line 63: | ||
Common practice is to load up a bunch of related series, and mute/unmute to verify revisions, dates, etc. for a visible regression. | Common practice is to load up a bunch of related series, and mute/unmute to verify revisions, dates, etc. for a visible regression. | ||
= Tree = | |||
== Branch names and confusion == | |||
We have a variety of branches at Mozilla, here are the main ones that we see alerts on: | We have a variety of branches at Mozilla, here are the main ones that we see alerts on: | ||
* Mozilla-Inbound (PGO, Non-PGO) | * Mozilla-Inbound (PGO, Non-PGO) | ||
| Line 79: | Line 79: | ||
A final note, Mozilla-Beta is a branch where little development takes place. The volume is really low and alerts come 5 days (or more) later. It is important to address Mozilla-Beta alerts ASAP because that is what we are shipping to customers. | A final note, Mozilla-Beta is a branch where little development takes place. The volume is really low and alerts come 5 days (or more) later. It is important to address Mozilla-Beta alerts ASAP because that is what we are shipping to customers. | ||
== What is coalescing? == | |||
Coalescing is a term we use for when we schedule jobs to run on a given machine. When the load is high these jobs are placed in a queue and the longer the queue we skip over some of the jobs. This allows us to get results on more recent changesets faster. | Coalescing is a term we use for when we schedule jobs to run on a given machine. When the load is high these jobs are placed in a queue and the longer the queue we skip over some of the jobs. This allows us to get results on more recent changesets faster. | ||
| Line 96: | Line 96: | ||
Note the two pushes that have no data (circled in red). If the regression happened around here, we might want to backfill those two jobs so we can ensure we are looking at the push which caused the regression instead of >1 push. | Note the two pushes that have no data (circled in red). If the regression happened around here, we might want to backfill those two jobs so we can ensure we are looking at the push which caused the regression instead of >1 push. | ||
== What is an uplift? == | |||
Every [[RapidRelease/Calendar|6 weeks]] we release a new version of Firefox. When we do that, our code which developers check into the nightly branch gets uplifted (thing of this as a large [[TestEngineering/Performance/Sheriffing/Tree_FAQ#What_is_a_merge|merge]]) to the Beta branch. Now all the code, features, and Talos regressions are on Beta. | Every [[RapidRelease/Calendar|6 weeks]] we release a new version of Firefox. When we do that, our code which developers check into the nightly branch gets uplifted (thing of this as a large [[TestEngineering/Performance/Sheriffing/Tree_FAQ#What_is_a_merge|merge]]) to the Beta branch. Now all the code, features, and Talos regressions are on Beta. | ||
This affects the Performance Sheriffs because we will get a big pile of alerts for Mozilla-Beta. These need to be addressed rapidly. Luckily almost all the regressions seen on Mozilla-Beta will already have been tracked on Mozilla-Inbound or Autoland. | This affects the Performance Sheriffs because we will get a big pile of alerts for Mozilla-Beta. These need to be addressed rapidly. Luckily almost all the regressions seen on Mozilla-Beta will already have been tracked on Mozilla-Inbound or Autoland. | ||
== What is a merge? == | |||
Many times each day we merge code from the integration branches into the main branch and back. This is a common process in large projects. At Mozilla, this means that the majority of the code for Firefox is checked into Mozilla-Inbound and Autoland, then it is merged into Mozilla-Central (also referred to as Firefox) and then once merged, it gets merged back into the other branches. If you want to read more about this merge procedure, here are [[Sheriffing/How_To/Merges|the details]]. | Many times each day we merge code from the integration branches into the main branch and back. This is a common process in large projects. At Mozilla, this means that the majority of the code for Firefox is checked into Mozilla-Inbound and Autoland, then it is merged into Mozilla-Central (also referred to as Firefox) and then once merged, it gets merged back into the other branches. If you want to read more about this merge procedure, here are [[Sheriffing/How_To/Merges|the details]]. | ||
| Line 114: | Line 114: | ||
* note: we do not generate alerts for the Firefox (Mozilla-Central) branch. | * note: we do not generate alerts for the Firefox (Mozilla-Central) branch. | ||
== What is a backout? == | |||
Many times we backout or hotfix code as it is causing a build failure or unittest failure. The [[Sheriffing/Sheriff_Duty|Sheriff team]] handles this process in general and backouts/hotfixes are usually done within 3 hours (i.e. we won't have [[TestEngineering/Performance/Sheriffing/Noise_FAQ#Why_do_we_need_12_future_data_points|12 future changesets]]) of the original fix. As you can imagine we could get an alert 6 hours later and go to look at the graph and see there is no regression, instead there is a temporary spike for a few data points. | Many times we backout or hotfix code as it is causing a build failure or unittest failure. The [[Sheriffing/Sheriff_Duty|Sheriff team]] handles this process in general and backouts/hotfixes are usually done within 3 hours (i.e. we won't have [[TestEngineering/Performance/Sheriffing/Noise_FAQ#Why_do_we_need_12_future_data_points|12 future changesets]]) of the original fix. As you can imagine we could get an alert 6 hours later and go to look at the graph and see there is no regression, instead there is a temporary spike for a few data points. | ||
While looking on | While looking on Treeherder for a backout, they all mention a backout in the commit message: | ||
[[File:Backout_tree.png]] | [[File:Backout_tree.png]] | ||
| Line 129: | Line 129: | ||
[[File:Backout_graph.png]] | [[File:Backout_graph.png]] | ||
== What is PGO? == | |||
PGO is [https://developer.mozilla.org/en-US/docs/Building_with_Profile-Guided_Optimization Profile Guided Optimization] where we do a build, run it to collect metrics and optimize based on the output of the metrics. We only release PGO builds, and for the integration branches we do these periodically (6 hours) or as needed. For Mozilla-Central we follow the same pattern. As the builds take considerably longer (2+ times as long) we don't do this for every commit into our integration branches. | PGO is [https://developer.mozilla.org/en-US/docs/Building_with_Profile-Guided_Optimization Profile Guided Optimization] where we do a build, run it to collect metrics and optimize based on the output of the metrics. We only release PGO builds, and for the integration branches we do these periodically (6 hours) or as needed. For Mozilla-Central we follow the same pattern. As the builds take considerably longer (2+ times as long) we don't do this for every commit into our integration branches. | ||
| Line 140: | Line 140: | ||
* PGO alerts will probably have different regression percentages, but the overall list of platforms/tests for a given revision will be almost identical | * PGO alerts will probably have different regression percentages, but the overall list of platforms/tests for a given revision will be almost identical | ||
== | = Alerts = | ||
== What alerts are displayed in Alert Manager? == | |||
[https://treeherder.mozilla.org/perf.html#/alerts Perfherder Alerts] defaults to [[TestEngineering/Performance/Sheriffing/Alerts|multiple types of alerts]] that are untriaged. It is a goal to keep these lists empty! You can view alerts that are improvements or in any other state (i.e. investigating, fixed, etc.) by using the drop down at the top of the page. | [https://treeherder.mozilla.org/perf.html#/alerts Perfherder Alerts] defaults to [[TestEngineering/Performance/Sheriffing/Alerts|multiple types of alerts]] that are untriaged. It is a goal to keep these lists empty! You can view alerts that are improvements or in any other state (i.e. investigating, fixed, etc.) by using the drop down at the top of the page. | ||
== Do we care about all alerts/tests? == | |||
Yes we do. Some tests are more commonly invalid, mostly due to the noise in the tests. We also adjust the threshold per test, the default is 2%, but for Dromaeo it is 5%. | Yes we do. Some tests are more commonly invalid, mostly due to the noise in the tests. We also adjust the threshold per test, the default is 2%, but for Dromaeo it is 5%. | ||
If we consider a test too noisy, we consider removing it entirely. | If we consider a test too noisy, we consider removing it entirely. | ||
| Line 156: | Line 157: | ||
Lastly, we should prioritize alerts on the Mozilla-Beta branch since those are affecting more people. | Lastly, we should prioritize alerts on the Mozilla-Beta branch since those are affecting more people. | ||
== What does a regression look like on the graph? == | |||
On almost all of our tests, we are measuring based on time. This means that the lower the score the better. Whenever the graph increases in value that is a regression. | On almost all of our tests, we are measuring based on time. This means that the lower the score the better. Whenever the graph increases in value that is a regression. | ||
| Line 176: | Line 177: | ||
[[File:Reverse_regression.png]] | [[File:Reverse_regression.png]] | ||
== Why does Alert Manager print -xx% == | |||
The alert will either be a regression or an improvement. For the alerts we show by default, it is regressions only. It is important to know the severity of an alert. For example a 3% regression is important to understand, but a 30% regression probably needs to be fixed ASAP. This is annotated as a XX% in the UI. there are no + or - to indicate improvement or regression, this is an absolute number. Use the bar graph to the side to determine which type of alert this is. | The alert will either be a regression or an improvement. For the alerts we show by default, it is regressions only. It is important to know the severity of an alert. For example a 3% regression is important to understand, but a 30% regression probably needs to be fixed ASAP. This is annotated as a XX% in the UI. there are no + or - to indicate improvement or regression, this is an absolute number. Use the bar graph to the side to determine which type of alert this is. | ||
NOTE: for the reverse tests we take that into account, so the bar graph will know to look in the correct direction. | NOTE: for the reverse tests we take that into account, so the bar graph will know to look in the correct direction. | ||
= Noise = | |||
== What is noise? == | |||
Generally a test reports values that are in a range instead of a consistent value. The larger the range of 'normal' results, the more noise we have. | Generally a test reports values that are in a range instead of a consistent value. The larger the range of 'normal' results, the more noise we have. | ||
| Line 191: | Line 192: | ||
[[File:Noisy graph.png|Noisy graph]] | [[File:Noisy graph.png|Noisy graph]] | ||
== What are low value tests? == | |||
In the context of noise, the low value mean that the regression magnitude is too small related to the noise of the tests, thus it's pretty hard to tell which particular bug/commit caused this, but rather a range. | In the context of noise, the low value mean that the regression magnitude is too small related to the noise of the tests, thus it's pretty hard to tell which particular bug/commit caused this, but rather a range. | ||
<br /> | <br /> | ||
| Line 197: | Line 198: | ||
<br /> | <br /> | ||
[[File:Noisy low value graph.png.png|Noisy low value graph]] | [[File:Noisy low value graph.png.png|Noisy low value graph]] | ||
== Why can we not trust a single data point? == | |||
This is a problem we have dealt with for years with no perfect answer. Some reasons we do know are: | This is a problem we have dealt with for years with no perfect answer. Some reasons we do know are: | ||
* the test is noisy due to timing, diskIO, etc. | * the test is noisy due to timing, diskIO, etc. | ||
| Line 205: | Line 207: | ||
The short answer is we don't know and have to work within the constraints we do know. | The short answer is we don't know and have to work within the constraints we do know. | ||
== Why do we need 12 future data points? == | |||
We are re-evaluating our assertions here, but the more data points we have, the more confidence we have in the analysis of the raw data to point out a specific change. | We are re-evaluating our assertions here, but the more data points we have, the more confidence we have in the analysis of the raw data to point out a specific change. | ||
This causes problem when we land code on Mozilla-Beta and it takes 10 days to get 12 data points. We sometimes rerun tests and just retriggering a job will help provide more data points to help us generate an alert if needed. | This causes problem when we land code on Mozilla-Beta and it takes 10 days to get 12 data points. We sometimes rerun tests and just retriggering a job will help provide more data points to help us generate an alert if needed. | ||
== Can't we do smarter analysis to reduce noise? == | |||
Yes, we can. We have other projects and a [https://wiki.mozilla.org/images/c/c0/Larres-thesis.pdf masters thesis] has been written on this subject. The reality is we will still need future data points to show a trend and depending on the source of data we will need to use different algorithms to analyze it. | Yes, we can. We have other projects and a [https://wiki.mozilla.org/images/c/c0/Larres-thesis.pdf masters thesis] has been written on this subject. The reality is we will still need future data points to show a trend and depending on the source of data we will need to use different algorithms to analyze it. | ||
== Duplicate / new alerts == | |||
One problem with [[TestEngineering/Performance/Sheriffing/Tree_FAQ#What_is_coalescing|coalescing]] is that we sometimes generate an original alert on a range of changes, then when we fill in the data (backfilling/retriggering) we generate new alerts. This causes confusion while looking at the alerts. | One problem with [[TestEngineering/Performance/Sheriffing/Tree_FAQ#What_is_coalescing|coalescing]] is that we sometimes generate an original alert on a range of changes, then when we fill in the data (backfilling/retriggering) we generate new alerts. This causes confusion while looking at the alerts. | ||
| Line 226: | Line 228: | ||
In Alert Manager it is good to acknowledge the alert and use the reassign or downstream actions. This helps us keep track of alerts across branches whenever we need to investigate in the future. | In Alert Manager it is good to acknowledge the alert and use the reassign or downstream actions. This helps us keep track of alerts across branches whenever we need to investigate in the future. | ||
== Weekends == | |||
On weekends (Saturday/Sunday) and many holidays, we find that the volume of pushes are much smaller. This results in much fewer tests to be run. For many tests, especially ones that are noisier than others, we find that the few data points we collect on a [https://elvis314.wordpress.com/2014/10/30/a-case-of-the-weekends/ weekend are much less noisy] (either falling to the top or bottom of the noise range). | On weekends (Saturday/Sunday) and many holidays, we find that the volume of pushes are much smaller. This results in much fewer tests to be run. For many tests, especially ones that are noisier than others, we find that the few data points we collect on a [https://elvis314.wordpress.com/2014/10/30/a-case-of-the-weekends/ weekend are much less noisy] (either falling to the top or bottom of the noise range). | ||
| Line 235: | Line 237: | ||
This affects the Talos Sheriff because on Monday when our volume of pushes picks up, we get a larger range of values. Due to the way we calculate a regression, it means that we see a shift in our expected range on Monday. Usually these alerts are generated Monday evening/Tuesday morning. These are typically small regressions (<3%) and on the noisier tests. | This affects the Talos Sheriff because on Monday when our volume of pushes picks up, we get a larger range of values. Due to the way we calculate a regression, it means that we see a shift in our expected range on Monday. Usually these alerts are generated Monday evening/Tuesday morning. These are typically small regressions (<3%) and on the noisier tests. | ||
== Multi Modal == | |||
Many tests are bi-modal or multi-modal. This means that they have a consistent set of values, but 2 or 3 of them. Instead of having a bunch of scattered values between the low and high, you will have 2 values, the lower one and the higher one. | Many tests are bi-modal or multi-modal. This means that they have a consistent set of values, but 2 or 3 of them. Instead of having a bunch of scattered values between the low and high, you will have 2 values, the lower one and the higher one. | ||
| Line 244: | Line 246: | ||
This affects the alerts and results because sometimes we get a series of results that are less modal than the original- of course this generates an alert and a day later you will probably see that we are back to the original x-modal pattern as we see historically. Some of this is affected by the weekends. | This affects the alerts and results because sometimes we get a series of results that are less modal than the original- of course this generates an alert and a day later you will probably see that we are back to the original x-modal pattern as we see historically. Some of this is affected by the weekends. | ||
== Random noise == | |||
Random noise happens all the time. In fact our unittests fail 2-3% of the time with a test or two that randomly fails. | Random noise happens all the time. In fact our unittests fail 2-3% of the time with a test or two that randomly fails. | ||
This doesn't affect Talos alerts as much, but keep in mind that if you cannot determine a trend for an alerted regression and have done a lot of retriggers, then it is probably not worth the effort to find the root cause. | This doesn't affect Talos alerts as much, but keep in mind that if you cannot determine a trend for an alerted regression and have done a lot of retriggers, then it is probably not worth the effort to find the root cause. | ||
= Bugzilla = | |||
== How do I identify the current firefox release meta-bug? == | |||
To easily track all the regressions opened, for every Firefox release is created a meta-bug that will depend on the regressions open. | To easily track all the regressions opened, for every Firefox release is created a meta-bug that will depend on the regressions open. | ||
[[File:Advanced search.png|Advanced search]]<br /> | [[File:Advanced search.png|Advanced search]]<br /> | ||
| Line 267: | Line 270: | ||
[[File:Firefox metabugs.png|1200px|Firefox metabugs]] | [[File:Firefox metabugs.png|1200px|Firefox metabugs]] | ||
== How do I search for an already open regression? == | |||
Sometimes treeherder include alerts related to a test in the same summary, sometimes it doesn’t. To make sure that the regression you found doesn’t have already a bug open, you have to search in the current Firefox release meta-bug for regressions open with the summary similar to the summary of your alert. Usually, if the test name matches, it might be what you’re looking for. But, be careful, if the test name matches that doesn’t mean that it is what you’re looking for. You need to check it thoroughly.<br /> | Sometimes treeherder include alerts related to a test in the same summary, sometimes it doesn’t. To make sure that the regression you found doesn’t have already a bug open, you have to search in the current Firefox release meta-bug for regressions open with the summary similar to the summary of your alert. Usually, if the test name matches, it might be what you’re looking for. But, be careful, if the test name matches that doesn’t mean that it is what you’re looking for. You need to check it thoroughly.<br /> | ||
Those situations appear because a regression appears first on one repo (e.g. autoland) and it takes a few days until the causing commit gets merged to other repos (inbound, beta, central). | Those situations appear because a regression appears first on one repo (e.g. autoland) and it takes a few days until the causing commit gets merged to other repos (inbound, beta, central). | ||
<br /> | <br /> | ||
== How do I follow up on already open regressions open by me? == | |||
You can follow up on all the open regression bugs created by you by searching in [https://bugzilla.mozilla.org/query.cgi?format=advanced Advanced search] for bugs with: | You can follow up on all the open regression bugs created by you by searching in [https://bugzilla.mozilla.org/query.cgi?format=advanced Advanced search] for bugs with: | ||
<br /> | <br /> | ||