Releases/Firefox 3.0.6/Post Mortem: Difference between revisions

Line 12: Line 12:
== IT ==
== IT ==
# Bouncer slave database was disabled, causing updates to fail when the master was under high load ({{bug|476753}}).  Had to pull/push updates a few times while debugging.
# Bouncer slave database was disabled, causing updates to fail when the master was under high load ({{bug|476753}}).  Had to pull/push updates a few times while debugging.
## Root Cause: <code>tm-bouncer01-slave02</code> was disabled in the load balancer during a previous maintenance window and never re-enabled.
## <code>tm-bouncer01-slave02</code> was disabled in the load balancer during a previous maintenance window and never re-enabled.
## Actions:  
## Actions:  
### Need to monitor backend service status on the load balancer ({{bug|476764}}) in the same way we monitor origin web servers.  Nagios would have alerted after maintenance window that the second slave was missing.
### Need to monitor backend service status on the load balancer ({{bug|476764}}) in the same way we monitor origin web servers.  Nagios would have alerted after maintenance window that the second slave was missing.
### Bouncer needs three databases to withstand a failure of one during release ({{bug|477183}}).
### Bouncer needs three databases to withstand a failure of one during release ({{bug|477183}}).
 
# Had to throttle bits after release ({{bug|476875}}) because mirrors couldn't handle load.  
 
## Mirror weighting needs to automatically change when mirrors come and go.  See {{bug|454023}} ("Sentry should automatically decrease weights on mirrors that get dropped and re-added") for more details.
# Had to throttle bits after release ({{bug|476875}}) because mirrors couldn't handle load. Why? How do we ensure this doesn't happen in the future?
## Process for adding new mirrors gated on one person, IT's added more subscribers to the list
# Throttling remained on long after necessary
## Need some monitoring to alert/remind when throttling is still configured ({{bug|477184}}).


== Websites ==
== Websites ==
# [issue here]
# [issue here]
Confirmed users, Bureaucrats and Sysops emeriti
1,081

edits