Taskcluster/Monitoring/Services

From MozillaWiki
Jump to: navigation, search

Service Tier Definitions

  • Tier 1: Required for TaskCluster Platform function
  • Tier 2: Insights into operations of TaskCluster Platform
  • Tier 3: External infra causes task failures

Services in each tier

Tier 1

  • AWS (us-east-1, us-west-1/2)
  • Heroku
  • Tutum
  • Azure
  • DockerHub (moving to AWS S3 for primary automation)
  • Pulse / CloudAMQP
  • Mozilla LDAP (not yet, soon)

Tier 2

  • Papertrail
  • Influx

Tier 3

  • Hg.mozilla.org
  • git.mozilla.org
  • github.com
  • VPN -> Balrog
  • AWS not in Tier 1

Project summary

  • Mak an API for events related to infrastructure status
  • Emit pulse messages for events
  • have an out of band status page
  • Consider: pause, or stop accepting tasks or stop scheduling tasks on Tier 1 failures