Monitoring is decentralized. Incident Response is distributed based on timezone, availability and project knowledge.
We use a number of services to maintain effective monitoring:
- Pingdom - Checks that our servers are up. Screams if they aren't.
- VictorOps - Incident Response Management. Dispatches alerts to sysadmins and compiles a nice timeline for us to manage incidents.
How to use it
How to request monitoring
|Tool||Usage||Primary Contact||Secondary Contacts|
|Pingdom||Uptime and latency monitoring||mrz|
|VictorOps||Incident Escalation and notifications||tanner||mrz, logan, yousef|
|Cloudwatch||Top Level Monitoring of AWS||Same as AWS|