Devops/monitoring-alerting

From MozillaWiki
Jump to: navigation, search

Mozilla Foundation Monitoring & Alerting

TLDR

MONITORING TOOLS, SYSTEMS, AND LINKS

  • Opsview, a Nagios clone with a much friendlier interface.
* Monitors and alerts when servers in load balancers are unhealthy
* Monitors and alerts on uptime/downtime of overall endpoints, such as https://webmaker.org
* Monitors and alerts on database utilization and downtime.
Important Opsview Links
Public Status Page
Current Unhandled Alerts (Login required)
Recent Alerts in Opsview
 !!!TODO : Add the guide for notifications & contact settings
  • New Relic monitoring (Login Required)
* Watching application response time in browser and server side
* Watching database and web server utilization, transactions, timings, and throughput
* Watching load balancer (ELB) metrics
* Performing serverside and client-side tracing of long running transactions
* Overall endpoint monitoring, such as https://webmaker.org
* Watching cache server utilization and metrics
* Watching Elasticsearch server utilization and metrics
* Watching Mongo server utilization and metrics
* Marks and compares new/old deployed versions of software
 !!!TODO : Add the guide for notifications & contact settings
Important New Relic Links
New Relic Dashboards
Recent New Relic Alerts
New Relic Applications Overview
Recent Deployments
Browser / Front-end Performance Overview
  • AWS Infrastructure and Autoscaling Monitoring/Alerting
* An email group exists to be notified of any autoscaling activities (up or down). Contact jp at mozillafoundation.org to be added to this list.
* Cloudwatch in the AWS console is capable of monitoring many metrics and utilization metrics, including CPU usage or network usage for a group, database, server, or ELB. Not many alarms are triggered from this outside of to trigger scaling.
Most AWS infrastructure is monitored via New Relic. See the side menu options in New Relic for RDS, ELB, EC2, Elasticache, etc...