Notes for 63 post mortem
10:00am PDT Tuesday November 6 (after the Channel Meeting) Vidyo channel: ReleaseCoordination IRC: #release-drivers
- 10 weeks of Nightly63 and 7 weeks of Beta63
- 290 uplifts to beta63 + 14 uplifts to RC = 304 uplifts [Thomas] How does this compare to previous releases? A: Number of uplifts is lower than normal.
- uplifts during beta: https://mzl.la/2RneBEW
- uplifts during RC week: https://mzl.la/2yMSjFR
- [tania] We started Fx63 with 31 features, 11 features had been moved to future releases/backlog
- [tania ]We received 8 PI Requests after the deadline (June 20th), 3 of them were submitted in August (post mid nightly sign-off)
- [tania ]9 features had landed late, 5 of them landed after the mid-nightly sign-off [Thomas] Do we have a summary of impacts to the release and/or the QA team from these late PI requests or features that missed mid-nightly sign off?
- [tania ]202 bugs were created in Nightly 63, 48 have been fixed and 88 were in New/Unassigned/Unconfirmed(as of Release day)
More data (including other releases): https://docs.google.com/spreadsheets/d/1r6aMX28R-quK3EAjX1sHPRKpEPGYEM5Gb0dg56oyplI/edit#gid=0 Fx63 key dates and deadlines missed: https://docs.google.com/spreadsheets/d/1Ac-0x2HFos0Gdb4dHIVJng_tWp1AckBUthpxWOn3SR0/edit#gid=0
What went well?
- No problem in general with the release schedule, what's new page, go live, release notes…
- Stable Desktop release post-launch, no major issue from end-user feedback
- Fastblock did not cause regressions (many uplifts to beta as they iterated on what they wanted product-wise and became Content Blocking)
- Less uplifts in total than previous releases which might indicate higher quality and more polished features at merge time
- Longer nightly cycle than beta cycle, and overall quality post launch was great!
- Long cycle (17 weeks, like 62) 10 weeks in Nightly, 7 weeks in Beta/RC
- 3 major OS updates (Redstone, Mojave, + compulsary Oreo support) during the cycle
- Most of the Nightly cycle Fennec was very unstable because of the API migration to support 26
- Scarce development resources, especially on the Gecko front
- Telemetry was broken for a month and it was noticed after shipping -> we stopped rolling out and a dot release is needed this week with that issue as driver [thomas] Can we incorporate a fixed QA check-in against telemetry to ensure this doesn't happen in the future?
- Large ANR issues reported after the release on Google Play, we didn't get notifications about it during beta/RC, maybe the volume of ANR was too low with a 5% on RC to get the notifications, should we push to more people in RC?
- ship-it v1 was not working for RC1 because we used ship it v2 for betas so the failure on v1 went unnoticed
- Betas didn't get updated translations after Sept 24 (l10n sign off being on Oct 10) because the script updating the mozilla-beta repository with l10n changesets was broken. This was discovered in RC1
- Does RelEng have a follow-up planned to improve monitoring of this job?
Snippets issue on desktop (63.0.1 driver)
- Fixed during nightly 64 cycle, but bug wasn't flagged for uplift by AS team before launch
- What monitoring/QA are we missing to have caught this on Beta? +1
More details from QA:
https://bugzilla.mozilla.org/show_bug.cgi?id=1503047 - Snippets are not loaded due to missing element Snippets are banners that appear on the about:home or about:newtab pages. However there are two Snippet systems. An old one which we have no idea who maintains and a new one developed by the Activity Stream team QA has been verifying bugs fixed by the AS team regarding snippets in Nightly only, because the new system is not yet ready to be released A fix that the AS team made has somehow stopped the old snippets system from working in 63. This could not have been caught because, to the best of our knowledge, no one is explicitly testing the old snippet system in Beta/Release
- https://bugzilla.mozilla.org/show_bug.cgi?id=1496246 Activity Stream landed by mistake a new nightly string into beta exposing it into l10n dashboard for translation.
- Blocking Autoplay shield study had wrong telemetry parameters and asked uplift on the last day of RCs https://bugzilla.mozilla.org/show_bug.cgi?id=1499803
- A WebGL regression in games https://bugzilla.mozilla.org/show_bug.cgi?id=1502748 caused by a missed webaudio backout patch could have been avoided, the patch backout for uplift was requested and approved in the same bug as the implementation which means it didn't show up in sheriffs queries for uplifts
- Because FastBlock was cancelled, everybody forgot to add a note for Content Blocking, I realized it was missing the week end before the release
- Beta 5 dev edition was not shipped because of a manual error (pascal: it seems I had lost VPN connection as I approved it in a meeting room after walking from my desk and didn't notice it), this went unnoticed.
- To remediate this, Release QA now does DevEdition update sanity checking for every release also
- Also added an extra balrog verification step to release checklists
- A first run issue came up, but turned out to not be new in 63. https://bugzilla.mozilla.org/show_bug.cgi?id=1500114
- [marcia] We were burned a few times in the 10.14 cycle, and there was a note in one bug that we need to audit our core | widget | cocoa code - https://bugzilla.mozilla.org/show_bug.cgi?id=1489785#c77
- I pinged Markus in the bug, https://bugzilla.mozilla.org/show_bug.cgi?id=1489785#c91. https://bugzilla.mozilla.org/show_bug.cgi?id=1503982 covers this
- [marcia] We should also make sure that we test on different types of hardware. There were a few 10.14 bugs that only reproduced on certain hardware
- 48 bugs (build bustage or perma failures of preffed off features) catched by beta simulations before Gecko 63 hit beta
- Having bouncer and balrog communicate so as to prepare the sign off for release data on the morning/eve and activate updates to mozilla.org download pages, bouncer links and softwate updates with the relman sign off.
- Separate Fennec and Firefox go live items as they are not sequential
- [Thomas E] Reach out to someone around the lack of focus or attention Fennec Android is receiving. It appears that nobody was looking at telemetry data in beta 63. This caused us to ship a regression to Fennec release that broke telemetry.
- [jcristau] Check with RelEng on adding better monitoring for l10n bumper script
- [pascalc] Follow-up with Activity Stream team about snippets QA on Beta
- How uplifts compare from release to release [thomas]
- Correlation between mid nightly misses and how this impacts other things down the line