Socorro:Releases/18
Contents
Upgrade Steps
This upgrade requires a downtime. Likely the downtime will only be 1/2 hour, but we should schedule a 1-hour downtime just in case. Both the processors and the web UI should be down during the upgrade.
Database changes need to be deployed first, then UI/mware changes.
FULL STEPS FOR UPGRADE
1PM: begin archival backup of database (mpressman). This must be complete before upgrade.
5PM: begin upgrade
- disable alerts (mpressman)
- stop monitor and processors (mpressman)
- stop cron jobs (mpressman)
- break replication between master01 and master02 and to ReplayDB (jberkus)
- fail over zeus from master01 to master02 (netops)
- upgrade database (jberkus)
- verify database (jberkus, mpressman)
- code push (ops)
- fail over zeus from master02 to master01 (netops)
- verify web application (QA)
- decide to pass or not - team (see ROLLBACK if no pass)
- deploy cron job changes (rhelmer?)
- restart monitor and processors (mpressman)
- restart cron jobs (mpressman)
- backfill missing hours (jberkus)
ROLLBACK PROCEDURE
- fail over zeus to master02
- roll back code push
- roll back cron job changes.
Post-upgrade cleanup
- resync master02 from master01 and restart replication (mpressman or jberkus)
- restore nagios/ganglia checks (mpressman)
Database Upgrade
We will need to coordinate with NetOps about failover: bug 790711 and coordinate with IT for code push bug 790707
IMPORTANT: Several hours before the upgrade, we need to do a full archival backup of the pre-Mobeta database. bug 790705
This database upgrade involves a number of irreversable changes. As such, two additional steps are required before deploying it:
- Archival final "pre-mobeta" offsite backup. See bug: https://bugzilla.mozilla.org/show_bug.cgi?id=762305
- Disable replication to master02 before running the upgrade, and do not restore it until after QA verification. This may then require a full resync of master02.
This database upgrade takes around 1/2 hour to run, assuming a 2-week backfill of the new matviews. Should we decide to do more than 2 weeks of backfill, this can be adjusted by editing upgrade.sh.
This database upgrade is lock-sensitive. As such, it requires the downtime per above. This downtime needs to include processors, monitor, web application, and cron jobs.
Procedure for minimal downtime upgrade:
- stop monitor, processors, cron jobs
- break replication between Master01 and Master02.
- fail over to master02
- upgrade master01
- fail back to master01
- QA master01
- if it passes, resync master02.
- otherwise, fail over to master02.
Cron job adjustments
- remove oldtcbs cron (cron_aggregates.sh) for bug 778255
- dev - Sep 5 09:04
- stage - TODO
- prod - TODO
Config changes
- bug 789410 makes changes to the daily.php-dist file - this needs to be checked into puppet, where it will be stored at /etc/socorro/web/ and then copied out to the socorro install using deploy scripts
- dev - Sep 12 09:29
- stage -
bug 790646 - prod - TODO