Performance sheriffing/Alerts

From MozillaWiki
Jump to: navigation, search

Perfherder alerts

General triage process

Perfherder is a performance monitoring tool, tracking many browser KPIs for each changeset. Although there are many KPIs which monitor very different things and in very different ways, Perfherder's UI is uniform and intuitive. One needn't know the specifics of a particular alert to start investigating it. Through means of retriggering and backfilling, it's pretty straightforward to narrow down on the offending changeset. Color hints are also easy to figure out: green always resembles performance improvements, while red means we now have a regression.

Types of alerts

Talos

  • short description: Monitors various time-based performance KPIs on the browser (more details)
  • frequency: daily, little more than a dozen alerts
  • coalesced by SETA?: yes (often requires backfilling)
  • available on platforms:
    • Windows: 7 32bit, 10 64bit (OPT, PGO builds)
    • Linux: 64bit (OPT, PGO builds)
    • OS X: 10.10 (OPT builds only)
  • triaging specifics:

build_metrics

  • short description: Monitor build times on multiple platforms, the size of the installers and other compiler-specific insights.
  • frequency: every 1-2 days, around 5 alerts
  • contact: :froydnj, :ted.mielczarek, :gps
  • coalesced by SETA?: no (shouldn't require backfilling)
  • available on platforms:
    • Windows: 32/64bit (OPT, No-OPT, Mingw builds)
    • Linux: 32/64bit (OPT, No-OPT builds)
    • OS X: 10.10 (cross, no-cross builds)
    • Android: 4.0, 4.2, 5.0
  • triaging specifics:
    • often easy to investigate
    • most alerts aren't noisy
    • when investigating, one should look for build config changes <ask :gps to provide more data>
    • build times often spike upwards for just a short time; they then lower to previous levels thanks to caching mechanisms set in place. We mark these as invalid alerts.

Autophone

  • short description: Monitors mobile performance of browser (more details)
  • frequency: every week or so, around 4 alerts
  • contact: :bc:
  • coalesced by SETA?: no
  • available on platforms:
    • Android: 4.2, 4.4, 6.0, 7.1
  • triaging specifics:
    • when investigating, one should look for Android related changes
    • many of these tests are pretty noisy; often, they turn out to be invalid (one reason is devices overheat, which affects tests)
    • should consider to needinfo? :bc:, to check status of suspect phone devices
    • tricky to investigate; consider using also Phonedash, as it's a more precise investigation tool for mobile regressions
    • retriggers are almost always needed, but results show up after a day or so

AWSY

  • short description: Are We Slim Yet monitors memory consumption by the browser (more details)
  • frequency: every 2-3 days, around half a dozen
  • contact: :erahm
  • coalesced by SETA?: yes
  • available on platforms:
    • Windows: 32/64bit (OPT, PGO builds)
    • Linux: 64bit (OPT builds)
    • OS X: 10.10 (OPT builds)
    • Android: 4.2, 4.3 (OPT builds)
  • triaging specifics:
    • retriggering/backfilling takes some time (>1h per test), so one must not abuse in collecting missing graph data

platform_microbench

  • short description: N/A
  • frequency: daily, around 1-2 dozen alerts
  • contact:
  • coalesced by SETA?: yes
  • available on platforms:
    • Linux: 32bit (OPT builds), 64bit (OPT, PGO, ASAN builds)
    • Windows: 7 32bit, 10 64bit (OPT builds)
    • OS X: 10.10 (OPT builds)
  • triaging specifics:
    • happen very often; unless triaged, they quickly pile up
    • very noisy alerts; often many of the alerts turn out to be invalid
    • cheap to retrigger, as each test takes <20min to finish; still, one should not abuse this