The Firefox Nightly respin process
|Purpose||Update our users' broken nightlies|
|Why||A patch on mozilla-central broke Nightly badly (crash, broken UI…)|
|Results||Nightly population retention|
|People involved||Relman, releng, sheriffs|
This page details the process to back out a faulty patch and, if needed, get nightlies respun.
- File a bug with as much detail as possible about the regression / crash if a bug hasn't been filed yet
- Ask releng to stop automatic nightly updates because of the bug filed in step 1
- Warn our users about the regression via our Twitter account and #nightly IRC channel, give the bug number.
- Investigate to find the faulty patch via mozregression or stack traces for crashes
- Ask sheriffs for the back out of the patch and nightly respun, give the bug number as reference
- Mark the bug as blocking the bug referenced for the faulty patch
- Ask the patch author to investigate the regression (NeedInfo in Bugzilla)
- When updates are back announce that the fix is served on Twitter and IRC
When should we back out a patch?
We want to back out a patch when a significant regression is identified. This is usually either a functional regression (browser unusable, content rendering broken) or a sudden spike of crashes on the Nightly channel.
We want the nightly channel to be as stable and usable by our community as possible. If the browser is crashing or barely usable, our users leave and we need to make we have a sizable nightly community. Deciding on backing out a patch, blocking automatic updates and rebuilding nightlies after the back out is a balance between the fact that bugs are expected on this channel and the fact that we want this channel to be of the highest quality possible to have a user base. The Release Management team are in charge of finding this balance.
How to find the patch to back out
If it is a functional regression (reproducible case), then we should use mozregression. If it is a spike in crashes not necessarily reproducible (random crashes while surfing), then our crash analysis experts in the Release Management team should be contacted. The analysis of the stack trace combined with hg logs on mozilla-central often allow finding the bug number that introduced the instability.
- If the bug was already filed by a community member, then use it to track the regression and qualify it. Add the nightly-community keyword if missing.
- If it is a crash, get a Crash ID from the people that reported it and file a bug via Socorro.
- If it is a functional regression and no bug was filed yet, file it.
Have the status-firefoxN tracking flag set as affected, the tracking-firefoxN set as blocking and the target milestone set to mozillaN where N is the version number for Nightly.
Once the back out is done, mark the bug as FIXED and change the status-firefoxN tracking flag from affected to fixed.
The bug number will be used to track the work to fix the regression. Communicate to our community that a bug exists so as to avoid having many duplicate bugs filed.
Stopping automatic background updates
If you think that a lot of people are going to be impacted by a regression, ask releng or relman to stop automatic update.
Blocking automatic updates will not prevent new users to install Firefox Nightly from mozilla.org but it will mitigate greatly the impact on our existing user base.
We can ask releng for automatic updates to be stopped for a specific OS and potentially set up a fallback update mechanism to the last good known builds.
Most of the time, it is Relman that stops update via Balrog, it stops updates for all OSes.
Most of the major regressions are reported immediately via our @FirefoxNightly Twitter account followers, usually when more than 2 people report a similar regression there is a high chance that it will be serious and stopping automatic updates should be done rapidly.
Asking for a back out of the patch and new nightlies
You can contact sheriffs in the #sheriffs IRC channel to back out the patch that caused the regression when you have identified it. The back out commit will reference the bug number.
If the next nightly is about to be built and the impact is moderate, it is not necessarily needed to ask for nightly builds to be respun after the back out.
Note: Some members of the release management team have the technical knowledge and permissions to back out patches.
Communicating about the issue
We should not hesitate to communicate the issue with a reference to the bug number to our community so as to minimize the number of duplicate bugs. If the issue needs steps to reproduce which are not obvious or a specific hardware/OS combination, having all communications centralized in a single bug helps.
We should also remember communicating about the resolving of the issue and urge people to upgrade to the updated nightly (so as to reduce automatic crash reports and unneeded bugs filed).
Communicating about major regressions in Nightly is also part of the informal social contract we have with our alpha testers, making sure they are informed of major technical issues impacting them helps keeping them engaged.
The main communication channels to communicate a regression are our @FirefoxNightly Twitter account, our @FirefoxNightly@mozilla.social Mastodon account and our #Nightly chatroom on Matrix/Element.
When updates are stopped, this will be automatically indicated on https://whattrainisitnow.com/release/?version=nightly with the reason message (usually linking to a bug) entered by release managers or sheriffs in Balrog when they stopped updates.