Telemetry/Alerts: Difference between revisions

minor editing for content
(First partial draft)
 
(minor editing for content)
Line 2: Line 2:
Many Telemetry probes were created to show performance trends over time. Sudden changes happening in Nightly could be the sign of an unintentional performance regression, so we introduced a system to automatically detect and alert developers about such changes.
Many Telemetry probes were created to show performance trends over time. Sudden changes happening in Nightly could be the sign of an unintentional performance regression, so we introduced a system to automatically detect and alert developers about such changes.


==What is is==
Thus we created Telemetry Alerts. It comes in two pieces: Cerberus the Detector and Medusa the Front-end.
Telemetry Alerts comes in two pieces: Cerberus the Detector and Medusa the Front-end.


===Cerberus===
===Cerberus===
Line 11: Line 10:


===Medusa===
===Medusa===
Medusa is in charge of emailing people when distributions change and for displaying the website alerts.telemetry.mozilla.org which contains pertinent information about each detected regression.
Medusa is in charge of emailing people when distributions change and for displaying the website https://alerts.telemetry.mozilla.org which contains pertinent information about each detected regression.


Medusa also checks for expiring histograms and sends emails notifying of their expiry.
Medusa also checks for expiring histograms and sends emails notifying of their expiry.


==What it can do==
==What it can do==
Telemetry Alerts is very good at identifying sudden changes in the shapes of normalized distributions of Telemetry probes. If you go to telemetry.mozilla.org and look at one day of GC_MS then look at the next and you can see the shift, then likely so can Cerberus.
Telemetry Alerts is very good at identifying sudden changes in the shapes of normalized distributions of Telemetry probes. If you can see the distribution of [https://mzl.la/2vdMRax GC_MS] shift from one day to the next, then likely so can Cerberus.


==What can't it do==
==What can't it do==
Telemetry Alerts is not able to see sudden shifts in volume. It is also very easily fooled if a change happens over a long period of time or doesn't fundamentally alter the shape of the probe's histogram.
Telemetry Alerts is not able to see sudden shifts in volume. It is also very easily fooled if a change happens over a long period of time or doesn't fundamentally alter the shape of the probe's histogram.
So if you have a probe like [https://mzl.la/2vdiuRx SCALARS_BROWSER.ENGAGEMENT.MAX_CONCURRENT_TAB_COUNT], Cerberus won't notice if:
* The number of pings reporting this value decreased in half, but otherwise reported the same spread of numbers
* The value increases very slowly over time (which I'd expect it to do given how good Session Restore is these days)
* We suddenly received twice as many pings from 200-tab subsessions (the dominance of 1-tab pings would likely ensure the overall shape of the distribution changed insufficiently much for Cerberus to pick up on it)


==Telemetry Alert Emails==
==Telemetry Alert Emails==
One of the main ways humans interact with Telemetry Alerts is through the emails sent by Medusa.
106

edits