Performance/Telemetry Regression Alerts

Overview

We have over 1,000 Telemetry probes so we need an automated way to monitor them for regressions. Noise is a major challenge, even more so than with Talos data, as Telemetry data is collected from a wide variety of computers, configurations and workloads. We require a reliable means of detecting regressions, improvements and changes in a measurement's distribution.

Design

The current prototype uses telemetry.js to fetch the histograms for the build-ids of the past couple of months. The histograms are passed to a python job that for each metric runs a regression algorithm and aggregates the histograms by platform and channel. The Bhattacharyya distance is computed between the histograms of the current build-id and the past N build-ids. If the variance of the distance between the histogram of the current build-id and the histograms of the past N build-ids is small enough and the distance between the histograms of the current build-id and the previous build-id is above a cutoff value K, a regression is reported. Furthermore, Histograms that don't have enough data are filtered out. Cut-off values are determined empirically from the data and past known regressions.

The Bhattacharyya distance has proven to perform significantly better on our dataset than using the Pearson correlation, a Chi-Square test, a Mann-Whitney test or a one class Support Vector Machine.

People

Roberto Vitillo: stats work
Mark Reid: Telemetry server-side changes
Avi Halachmi
Vladan Djeric
External contributors welcome, contact vdjeric@mozilla.com

Meeting notes

Tracking bug: bug 1031011

June 25, 2014: Initial meeting to discuss the approach & requirements

Performance/Telemetry Regression Alerts

Contents

Overview

Design

People

Meeting notes

Navigation menu

Performance/Telemetry Regression Alerts

Overview

Design

People

Meeting notes

Navigation menu

Search