Community Ops/Monitoring: Difference between revisions

Revision as of 18:01, 12 September 2015

	THIS PAGE IS A WORKING DRAFT
	The page may be difficult to navigate, and some information on its subject might be incomplete and/or evolving rapidly. If you have any questions or ideas, please add them as a new topic on the discussion page.

Monitoring is decentralized. Incident Response is distributed based on timezone, availability and project knowledge.

We use a number of services to maintain effective monitoring:

Pingdom - Checks that our servers are up. Screams if they aren't.
VictorOps - Incident Response Management. Dispatches alerts to sysadmins and compiles a nice timeline for us to manage incidents.

TBD

TBD

Tool	Usage	Primary Contact	Secondary Contacts
Pingdom	Uptime and latency monitoring	mrz
VictorOps	Incident Escalation and notifications	tanner	mrz, tad, logan, yousef, pancakes, dumitru
Cloudwatch	Top Level Monitoring of AWS	Same as AWS
StatusHub	Dashboard	mrz
New Relic	Application Monitoring	jp	tad, tanner, logan, yousef, mrz

@@ Line 16: / Line 16: @@
 ===How to request monitoring===
 TBD
+== Monitoring ==
+{| class="wikitable"
+|-
+! Tool !! Usage !! Primary Contact !! Secondary Contacts
+|-
+| Pingdom || Uptime and latency monitoring || [https://mozillians.org/en-US/u/mrz mrz]
+|-
+| VictorOps || Incident Escalation and notifications || tanner ||   [https://mozillians.org/en-US/u/mrz mrz], tad, logan, yousef, pancakes, dumitru
+|-
+| Cloudwatch || Top Level Monitoring of AWS || Same as AWS
+|-
+| StatusHub || Dashboard || [https://mozillians.org/en-US/u/mrz mrz] ||
+|-
+| New Relic || Application Monitoring || jp ||  tad, tanner, logan, yousef, [https://mozillians.org/en-US/u/mrz mrz]
+|}