Latest revision as of 19:24, 12 September 2012

Upgrade Steps

This upgrade requires a downtime. Likely the downtime will only be 1/2 hour, but we should schedule a 1-hour downtime just in case. Both the processors and the web UI should be down during the upgrade.

Database changes need to be deployed first, then UI/mware changes.

FULL STEPS FOR UPGRADE

1PM: begin archival backup of database (mpressman). This must be complete before upgrade.

5PM: begin upgrade

disable alerts (mpressman)
stop monitor and processors (mpressman)
stop cron jobs (mpressman)
break replication between master01 and master02 and to ReplayDB (jberkus)
fail over zeus from master01 to master02 (netops)
upgrade database (jberkus)
verify database (jberkus, mpressman)
code push (ops)
fail over zeus from master02 to master01 (netops)
verify web application (QA)
decide to pass or not - team (see ROLLBACK if no pass)
deploy cron job changes (rhelmer?)
restart monitor and processors (mpressman)
restart cron jobs (mpressman)
backfill missing hours (jberkus)

ROLLBACK PROCEDURE

fail over zeus to master02
roll back code push
roll back cron job changes.

Post-upgrade cleanup

resync master02 from master01 and restart replication (mpressman or jberkus)
restore nagios/ganglia checks (mpressman)

Database Upgrade

We will need to coordinate with NetOps about failover: bug 790711 and coordinate with IT for code push bug 790707

IMPORTANT: Several hours before the upgrade, we need to do a full archival backup of the pre-Mobeta database. bug 790705

This database upgrade involves a number of irreversable changes. As such, two additional steps are required before deploying it:

Archival final "pre-mobeta" offsite backup. See bug: https://bugzilla.mozilla.org/show_bug.cgi?id=762305
Disable replication to master02 before running the upgrade, and do not restore it until after QA verification. This may then require a full resync of master02.

This database upgrade takes around 1/2 hour to run, assuming a 2-week backfill of the new matviews. Should we decide to do more than 2 weeks of backfill, this can be adjusted by editing upgrade.sh.

This database upgrade is lock-sensitive. As such, it requires the downtime per above. This downtime needs to include processors, monitor, web application, and cron jobs.

Procedure for minimal downtime upgrade:

stop monitor, processors, cron jobs
break replication between Master01 and Master02.
fail over to master02
upgrade master01
fail back to master01
QA master01
if it passes, resync master02.
- otherwise, fail over to master02.

Cron job adjustments

remove oldtcbs cron (cron_aggregates.sh) for bug 778255
- dev - Sep 5 09:04
- stage - TODO
- prod - TODO

Config changes

bug 789410 makes changes to the daily.php-dist file - this needs to be checked into puppet, where it will be stored at /etc/socorro/web/ and then copied out to the socorro install using deploy scripts
- dev - Sep 12 09:29
- stage - ~~bug 790646~~
- prod - TODO

@@ Line 5: / Line 5: @@
 Database changes need to be deployed first, then UI/mware changes.
-== Database Upgrade ==
+== FULL STEPS FOR UPGRADE ==
+PM: begin archival backup of database (mpressman).  This must be complete before upgrade.
+PM: begin upgrade
+# disable alerts (mpressman)
+# stop monitor and processors (mpressman)
+# stop cron jobs (mpressman)
+# break replication between master01 and master02 and to ReplayDB (jberkus)
+# fail over zeus from master01 to master02 (netops)
+# upgrade database (jberkus)
+# verify database (jberkus, mpressman)
+# code push (ops)
+# fail over zeus from master02 to master01 (netops)
+# verify web application (QA)
+# decide to pass or not - team (see ROLLBACK if no pass)
+# deploy cron job changes (rhelmer?)
+# restart monitor and processors (mpressman)
+# restart cron jobs (mpressman)
+# backfill missing hours (jberkus)
+=== ROLLBACK PROCEDURE ===
+# fail over zeus to master02
+# roll back code push
+# roll back cron job changes.
+=== Post-upgrade cleanup ===
-As usual, run "18.0/upgrade.sh breakpad".
+# resync master02 from master01 and restart replication (mpressman or jberkus)
+# restore nagios/ganglia checks (mpressman)
-This database upgrade takes around 1/2 hour to run, assuming a 2-week backfill of the new matviews.  Should we decide to do more than 2 weeks of backfill, this can be adjusted by editing upgrade.sh.
+== Database Upgrade ==
+We will need to coordinate with NetOps about failover: {{bug|790711}} and coordinate with IT for code push {{bug|790707}}
-This database upgrade is lock-sensitive.  As such, it requires the downtime per above.
+IMPORTANT: Several hours before the upgrade, we need to do a full archival backup of the pre-Mobeta database.  {{bug|790705}}
 This database upgrade involves a number of irreversable changes.   As such, two additional steps are required before deploying it:
@@ Line 17: / Line 48: @@
 # Archival final "pre-mobeta" offsite backup. See bug: https://bugzilla.mozilla.org/show_bug.cgi?id=762305
 # Disable replication to master02 before running the upgrade, and do not restore it until after QA verification.  This may then require a full resync of master02.
+This database upgrade takes around 1/2 hour to run, assuming a 2-week backfill of the new matviews.  Should we decide to do more than 2 weeks of backfill, this can be adjusted by editing upgrade.sh.
+This database upgrade is lock-sensitive.  As such, it requires the downtime per above.  This downtime needs to include processors, monitor, web application, and cron jobs.
+Procedure for minimal downtime upgrade:
+# stop monitor, processors, cron jobs
+# break replication between Master01 and Master02.
+# fail over to master02
+# upgrade master01
+# fail back to master01
+# QA master01
+# if it passes, resync master02.
+#* otherwise, fail over to master02.
+== Cron job adjustments ==
+* remove oldtcbs cron (cron_aggregates.sh) for {{bug|778255}}
+** dev - Sep  5 09:04
+** stage - TODO
+** prod - TODO
+== Config changes ==
+* {{bug|789410}} makes changes to the [https://raw.github.com/mozilla/socorro/stage/webapp-php/application/config/daily.php-dist daily.php-dist] file - this needs to be checked into puppet, where it will be stored at /etc/socorro/web/ and then copied out to the socorro install using deploy scripts
+** dev - Sep 12 09:29
+** stage - <strike>{{bug|790646}}</strike>
+** prod - TODO

Socorro:Releases/18: Difference between revisions