Socorro:MigrationPostmortem

From MozillaWiki
Revision as of 22:10, 27 January 2011 by Jberkus (talk | contribs) (→‎Suggested Improvements: coordination)
Jump to navigation Jump to search

Schedule / Location / Call Information

  • Thursday, 2011-01-27 @ 1:30 pm PST
  • Location The Bridge
  • 650-903-0800 x92 Conf# 362 (US/INTL)
  • 1-800-707-2533 (pin 369) Conf# 362 (US)
  • join irc.mozilla.org #socorro for back channel

Overview

Things that went right

  • Teamwork between Dev/IT was the best Laura has seen (anywhere)
  • Smoke/load testing gave us a good level of confidence
  • Getting configs into puppet
  • Instrumentation via nagios/ganglia helped us find problems during tests
  • Actual release day went great - total anticlimax
  • Having a unified task list with dates leading up to the migration
  • Having a checklist and rollback plan on the day
  • QA tests on WebUI were passing for days beforehand

Things that went wrong

  • HBase data sync not complete and verified until just a few days before the migration
  • Various issues through the last week with network configuration issues: Zeus slowness, bonding setup etc.
  • Getting backlog in was going to take a long time. We corrected our approach on Sunday night (and finished back processing before Monday a.m.), but we should have acted on this earlier.
  • Difficulties getting correct ADUS due to an unrelated problem with Vertica; poor timing here (SJC was broken in the same way as PHX)
  • Missed a cron job despite multiple audits
  • Upgrade to RHEL6 coincided with hardware/software architecture change.

Suggested Improvements

  • Better communication with Netops: get a unified list of requested changes in order before requesting change
  • Better communication to groups outside release-drivers. Actions bug 628318:
    • Get Socorro on status.mozilla.org
    • Set up a Socorro specific blog for maintenance/downtime information
  • Need better coordination of all staff with timeline
    • Need better tools than bugzilla + google spreadsheets -- a professional project management tool is called for
    • Need to have a manager in charge for every day of the timeline, even when people are on vacation
    • If we ever have a project this big again, consider hiring a professional project manager consultant.