Marketplace Push Duty
So, you're going to update marketplace.firefox.com, eh? You've come to the right place. As release manager your responsibilities are:
- Tagging releases when the milestone closes and updating stage
- Evaluating any potential impact of the push on system performance
- Evaluating and cherry picking requests for the tag after it closes
- Ensuring the waffle flags on stage are set appropriately (if it's going out in the next push, it's enabled, otherwise it is equivalent to production)
- Working with Ops during push to make sure the release is smooth
- Working with QA to make sure any concerns are addressed
- Following up with Ops and QA to do repeat pushes to address any critical issues
- Noting any new major features going out on the etherpad
- Telling the person after you that they are on for the next week
You can subscribe to this for the week you are on push duty, then turn off when you are off it.
A walkthrough of a push
Tagging and Pushing to Stage
You should be tagging Friday at 11am PST before you expect to push. Remind folks on IRC that you're tagging and make sure they don't have half-finished patches.
- Tag the repositories and push the tags to Stage -- this can be done automatically or manually.
- Actively steward the push to Stage -- if there's an error during push or if the push will have adverse affects on production performance, work with Ops and commit authors to either redo or adjust the push (more on that below).
- Update the etherpad with the compare URLs for each repo -- add in the github compare URLs into the etherpad, so when the push comes people can easily see what is about to go out.
Manual tagging and pushing to Stage
Name your tag with the date of the push in the format YYYY.MM.DD.
The following repositories need tagging:
There is a script which can do all that for you. Try:
python tagz.py -r mozilla/commbadge,mozilla/fireplace,mozilla/marketplace-operator-dashboard,mozilla/marketplace-stats,mozilla/monolith-aggregator,mozilla/transonic,mozilla/zamboni,mozilla/marketplace-content-tools -c create -t YYYY.MM.DD
Next you'll need to update the staging servers:
- Go to jenkins (restricted, you'll need VPN+LDAP login to get here)
- Push items by choosing "Build with Parameters" (on the left, above "Build History" -- if you don't see that option, you need to ask Ops to change your permissions).
- Enter the tag to be deployed where it says "DeployRef" -- note that the tag must be the same for all repos.
What it means to steward the push
While the ideal is for pushes to be uneventful, that's not always the case. The push hero isn't expected to single-handedly resolve any issues, but they are expected to work with Ops to identify issues and get the proper help (most likely the relevant commit author). It's important that this happens as part of the push to Stage, rather than on Tuesday as part of the push to Production. That's part of the point of having a Staging site.
Important note about data migrations: in our system, as with any system that isn't under immediate control (due to load-balancing or caching), we have to ensure that a push doesn't incur unreasonable system downtime. Data migrations are a known risk in this regard. If a migration on Stage shows that an unacceptable lag in performance will occur, the relevant commit should be refactored so that the to-be-pushed code does not rely on to-be-pushed data changes -- and Ops will need to know that updating the database servers must be handled differently.
Task: rename a column in a 4-million row table. This can take minutes, and can render the system unresponsive during that time. To do this without noticeable downtime:
- Add a new column with the new column name
- Copy data from old column to new column via SQL script
- Push the code that uses the new column
- Update any rows that may have been added during previous steps
- Remove the old column
... with Ops performing steps 1, 2, and 5 on each database server individually (by taking it out of rotation, running updates, and then putting it back into rotation to catch up via replication). We don't want to surprise Ops with this on Tuesday; we'd want to identify this if not during tagging, at least after the push to Stage.
Pushes happen Tuesday at 11am. There is an etherpad made each week named mkt-YYYY-MM-DD. An example. The push will mostly follow this etherpad and any special notes should be in that pad.
You might want to add in a meeting for yourself for the push time so that people won't try and schedule you for meetings.
- The release manager (you), QA (krupa), and Ops (jason or jlaz) should be in contact on IRC and in the Marketplace vidyo room.
- Once everyone gives the thumbs up Ops will push the actual code using jenkins. Ops will push the projects in order (same order you did for stage). Talk on vidyo if there are any questions.
- The IRC bots will say when the pushes are done.
- Once the push is done, QA will verify changes. Work with them to flip any waffle switches or tweak any adjustments.
- Whilst QA is reviewing...
- If QA or Ops finds something that needs fixing immediately:
- Write a patch (or find someone who can);
- Cherry-pick the patch onto the previous tag (Example);
- Go back to step 2 until QA is happy. OK, until QA is satisfied, then.
- Once QA, Ops, and you all sign off the push is over. Record the time it took in the bottom of the etherpad.
After the Push
- Create a new etherpad for the next week using the push template.
- Edit the topic in the secret channel pointing to the new etherpad.
- Remind next week's release manager they are on the hook! :)
- Send an email to the public mailing list (email@example.com) saying how the push went. If there was reason for multiple pushes, or anything that could be improved or fixed (e.g. dodgy migration), let the team know using this handy template.
Release manager rotation
(There will be exceptions to the rotation. No problem, we just need to be aware of them and plan for them.)