Firefox 32 Post-mortem
Add your feedback (good and bad) on Firefox 32 below. Please qualify with your nick.
- (lmandel) We had a number of critical bugs to resolve in the last week of beta, need to get better at fixing earlier
- (lmandel) Avast caused us pain. How can we work better with companies like them to avoid unknown issues?
- (lmandel) beta gtb very late in the day. This impacts QE/softvision qualification. Can we set cut off times during the day?
- (lmandel) mobile betas released on Wed in 32. I'm told they released on Tue in prior releases. Kevin tells me that twice in the cycle QE does a more thorough manual check that requires 2 days. Should we stay with Wed to be consistent?
- (lmandel) We have been permitting more aggressive uplift of feature work during the last few cycles. We have also had quality issues recently - notably stability. We are working to find the balance that allows us to move quickly while retaining the quality criteria that we want in our releases.
- (lmandel) I found that many people did not know the beta schedule. A small number of people also didn't know what is appropriate for uplift. I think relman (that's me) needs to do a better job of working with our teams to understand our process. I started with a talk with the desktop team when they were in Toronto two weeks ago.
- (lmandel) tbpl leaves a lot to be desired from a release manager's perspective. It doesn't give a good indication of when the build will be complete, jobs continue to populate as they develop rather than up front, there is no overall test counts. I'm thrilled that a lot of these issues will be fixed in treeherder (example: https://bugzilla.mozilla.org/show_bug.cgi?id=1052397). (Calling this out as feedback from my first full release.)
- (lmandel) Builds take a long time. It is 10+ hours from checkin to ready to test for beta builds. Are there any perf fixes that are already known that we can focus on to reduce build times? Can we review and see if there are other ways to being the time down so that we can be more nimble when it comes to turning around betas, RCs, and point releases?
- (sylvestre) maybe "fastrack" beta build jobs or allocate faster AWS machine?
- (catlee) we recently trimmed ~2h from post-build release process (pushing to mirrors)
- (catlee) partials-on-demand service will speed up builds & repacks a bit
- (lmandel) I would like to see if we can create better defined criteria for crashes that we consider release blockers. The top 10 list gives us something to go on but there is always a top 10 list. Can we have a list of startup crashes and a list of crashes that are known to impact over X% of the population (or something like this) so that we have lists to drive to zero for the release?
- (dbaron) Type of crash needs to be an input into prioritization; the ranking in the topcrash list is comparing apples to oranges. A crash that strands users because Firefox doesn't start up is very different from a low-frequency intermittent crash, which is in turn very different from a crash that happens when you take a particular avoidable action. We should try to measure these by real numbers: N users stranded, X users crashing every Y hours, etc.
- We do not have any kind of per-user data, but we may be able to estimate some things roughly based on install time (which our "installations" number derives from). That said, good ideas on algorithms and specific ways to get to good numbers are appreciated.
- (dbaron) We should have tools that automatically detect particular types of crashes (reliable startup crash; intermittent crash; crash on action); they have significantly different appearances. +1 (lizzard) we could ask the socorro developers to add a new view for startup crashes. Unfortunately, that means they would not work on anything else want for about a quarter. That said, any help in actually finding useful algorithms and/or definitions of what exactly to look for are appreciated Well... how about just making a column on front page for "startup" or not , then we could sort.
- (KaiRo) We usually go for prioritize anything that makes the crash rate worse than the release before plus anything that is new, over 0.5-1% of all crashes and startup or otherwise grave enough to stop usage.
- (lmandel) jmaher has been providing perf updates during channel meetings and filing and tracking bugs. There were a handful of bugs to address in 32. I found it great to know that the performance of the release was being tracked and find this valuable even if there are not a large number of issues that are being identified.
- (lmandel) We were surprised in 32 with tap offsets on Android. We also discovered a bug in our mixed content/lock icon that shipped in 32.0.1. We were surprised in prior releases with Web regressions. Are there ways that we can improve our knowledge of the quality of a release from QE's perspective? Would that be new/changed test plans, more automation, more hands for more coverage?
- [kbrosnan] tap offsets was known but impact wasn't known, turned out to be big in release
[mschifer] How did 933733 manage to land without any changes after being backed out previously? [juanb] QE could have used an earlier heads-up on Social API marketing promo for 32.0. Found several, non-recent regressions late in beta.
- (lmandel) Part of this is fail on my part as I knew about this from the end of Aurora
- (scaraveo) I don't know all of the stakeholders with whom I should be working. Need to learn more. Get QA help earlier and get more people aware of what's happening.
- (lmandel) We didn't know that cachev2 tests weren't enabled. How can we avoid this in the future?
- (lmandel) Several social related bugs had to be fixed last minute resulting is gtb late in the day
- (scaraveo) 2, of 3, are related to changes around australis and loop, which are two area's that I am light on automated tests, thus not caught in earlier releases. As well, some tests are too granular and not catching real situations. Solution is to find time to spend on more tests. The third could have waited (in hindsight) but was relatively trivial, in the future should make a better decision on severity/size before uplifting.
- (lmandel) Why didn't we catch Android tap offsets during beta?
- (lmandel) We break the Web sometimes. I don't think this in of itself is necessarily a problem (sometimes we want to break the Web) but I do think that we should know that we're doing it. How can we better identify bugs that have the risk of breaking sites and work with the Web compat team to understand the impact before we release?
- (lmandel) Should we review the latest nightly, aurora, and beta cycles every 6 weeks? Will this be better than reviewing one release at a time?
- (lmandel) We haven't typically included engineering in this review? Should the post mortem include all of the people who work on a release?