Firefox 50 Post-mortem
50.1 post mortem
(meeting notes from Jan 10th)
- Planning and coordination was a bit of an effort for releng, relman, sec team.
- Lack of Beta pushes adds risk to every uplift
- Sec team had to scramble to write advisories unexpectedly during all-hands week - Sec team feedback that we usually don't have so many last minute sec issues land at the last minute
- We met with sec team and agreed upon a new strategy which requires closer coordination during last 2 betas+RC week on all sec uplifts
- repeat of https://bugzilla.mozilla.org/show_bug.cgi?id=1321400 / https://bugzilla.mozilla.org/show_bug.cgi?id=1321411 - www team was not aware that we were doing a 50.1/45.6.0esr release
- what happened or didn't happen because they weren't aware?
- Ritu to follow up on details, share with Erin/channel meeting next week.
- We had patches that should have been uplifted to esr in release, and vice versa (part of the last minute fixes) In hindsight: we should have asked for a special bugzilla tracking flag - it was too confusing to keep using the flag for 50. - As always, some patches from sec-critical/high bugs landed on m-c without sec approval (more or less forcing us to take work we might not have otherwise)
- Versioning it as 50.0.3 would have been better than 50.1.0. The former makes it a smaller scope release rather than versioning it as a 50.1.x.
- This didn't land for 50.1 1311687 (combination of: confusion about 50.1 flags, wontfixed, uplift during workweek)
50 Post mortem
Add-on sdk startup perf issues, last minute in beta.
- What process improvements do we need to catch these performance issues in the future?
- (Ritu) Get an email discussion on what can we change? early sign offs?
- lost some crash data (may reprocess it) from all channels. not sure how this may have affected our judgement of beta crash rates (10-10 to 10-21)
- We should always recover/post-process crash data that we missed out. Marco, Ritu to follow up why this cannot be a standard policy.
- according to https://crash-analysis.mozilla.com/release-mgmt/crash-report-tools/longtermgraph/?fxbeta it looks like only "plugin hangs" are missing during that period - no sudden drops in other categories
- Serious IME issues just after release for 50 and 49. example, https://bugzilla.mozilla.org/show_bug.cgi?id=1317906
Doing too many uplifts or late uplifts? More testing? automated test coverage? more manual testing? how can we avoid this. DavidB wants to be a part of this discussion.
- Ritu to follow up with DBolter and Masayuki team. Also check with Florin.
- Marcia will help coordinate efforts on getting community folks/l10n on testing this early on.
- Staged rollout of Fennec is a good idea and we should keep doing it.
- Fennec 50 top crasher - bug 1317785
- How can we catch this before we go to release next time?
- Crash reports combines MozOnline build and other Firefox build into a single crash rate. Does it make sense to separate them? And can it be done?
- how can we separate this out in crash-stats without using buildid (which would need a constantly updated index of buildids, which.... might be nice)
- would need a new application id on crashstats, then something built into mozilla online data.......
- Top crasher didn't get much attention during the beta cycle ( https://bugzilla.mozilla.org/show_bug.cgi?id=1308863 ), was fixed out of luck
- Beta top crashers don't get investigated, fixed as quickly as is needed
- Email devs, engg boss, stability, uptime ML
- Crash Kill was useful
- Erin might look into getting us an EPM
49.x (dot releases and system add-ons) post mortem
async rendering / plugin drawing issues.
- We didn't know about this as an upcoming big change/feature (not in product meeting, or tracked by relman)
- No QE in aurora/beta 49 that I'm aware of
- The pref didn't get flipped as Engineering had planned on 49 release
- This caused regressions with Flash text input for IME, and other functional regressions as well as perf/scrolling issues
- We shipped a system add-on to flip the pref after some testing by SV Las Vegas
- we expected there to be regressions for some sites, but overall an improvement with sandboxing/perf/crashes
- Instead many regressions including most popular flash games broke (Farmville, Bejeweled, etc. on and off FB)
- We shipped a second system add-on to turn off the prefs again for win32. Win64 users still have the regression for the next 2 weeks (because we would have to ship a dot release for win64 to fix)
- Also this was turned off on beta 50 and won't ship as part of 50 (for both win32 and 64)
- Good triage meeting run by jimm, adobe engineers helping, to go thru the regressions
- system addon process for release hotfixes/pref flips still unclear. We need to build in getting stats on uptake for each system add-on, and reporting them regularly
- Engineering ownership still unclear. Jet's team? Milan's? Benjamin? Jimm? dvander did some work to help out but part of the failure here may be that no one "owned" this well.
- Preffed off for now - most issues remaining are Adobe's. Can we turn this on for 51? Not sure (for January)
- https://bugzilla.mozilla.org/showdependencytree.cgi?id=1229961&hide_resolved=1 - Main bug list that Jimm was triaging
- stefan is the new QE/LV lead for plugins
47.0.2 dot release post mortem
- needed new infrastructure that we more or less had to have for 50.0.1 anyway
- Thanks releng for setting that up! (rail, bhearsum)
- We threw out build1 and did a build2 at the last minute before release (rstrong's issue)
- (liz) I could have noticed this earlier and taken the SSE detecting patch if I'd read email late Friday or on the weekend. But I didn't! It is entirely OK to escalate harder (find me on irc or even call me) If it might save unnecessary testing by QE (so boring for them and more work)
- update rules, update testing is a bit tricky and Florin was going on PTO