Releases:Release Post Mortem:2016-02-10

From MozillaWiki
Jump to: navigation, search

Meeting Details

« previous week | index | next week »
< most recent | upcoming >


Release Duty

  • FF 45 cycle: mtabara

Misc

Shipped

Firefox/Fennec 45.0b4 (mtabara/nick/rail)

  • flawless victory!
  • building despite bug 1246854, should not impact the release
  • Firefox intermittent errors:
    • failed at repack_4/10 on linux: failed because no space was left on device while downloading update, automatic retry
    • failed at repack_3/10 on win32: lower Sorbian locale download, retriggered
    • regular GTK3 known-issue errors for linux/linux64 update_verify_beta steps
  • Fennec - shipped a day later, all good! TODO: awaiting push to Google Play Store and post-release step

Firefox 44.0.1 (jlund/rail/mtabara/nick)

  • This dot release is desktop only and has fixes for bug 1244505, bug 1242176, bug 1222171 and bug 1244069.
  • once shipped, the update rates have been set to 100% at RelMan's instructions. However 44.0.1 we reduced the updates to 0% yet again because of a new security bug issue bug 1245724. Chemspill/dot release is awaited soon => 44.0.2 is underway for both desktop and mobile
  • build1
    • some intermittent errors:
      • failed at repack_4/10 on win32, retriggered - It seems to have been some trouble in handling/processing the firefox-42.0-44.0.1.partial.mar while repacking for Scottish Gaelic locale
      • failed at firefox_antivirus, retriggered - seems to have been a network downloading issue with German language locale for one of the mac partials MARs.
      • failed at firefox_antivirus, retriggered - seems to have been a network downloading issue with Songhay language locale for one of the mac partials MARs.
      • failed at update_verify_release_1/6 on win32 - timeout, retriggered
    • abandoned - Media Playback team has received reports of A/V sync problems (multiple seconds) with some YouTube content (bug 1245696. This is a regression in FF 44 from bug 1229605. Turning off "media.mediasource.webm.audio.enabled" will revert from Opus audio to AAC audio, which is a well-tested code path. Opus has slight better sound quality and/or lower bandwidth requirements than AAC. Building build2 with that pref turned off
  • build2
    • some intermittent errors:
      • failed at repack_10/10 on win64: similar to "Thunderbird 45.0b1 build1: failed at repack_2/10 on win32", xh locale failed while running make_incremental_update.sh, possibly from a balrog submission step? At any rate, this seems intermittent.
    • we were a bit impatient and didn't wait for 'ready for release' email before pushing updates. Turns out the bouncer check was stalled
2016-02-08 14:25:43-0800 [HTTPPageGetter,client] TriggerBouncerCheck: uptake is 0

which was from the request

https://bounceradmin.mozilla.com/api/uptake/?product=Firefox-44.0.1-Partial-41.0.2&os=win64

not having any uptake. Win64 started at 42.0, so a partial from 41.0.2 makes no sense. Would not be surprised if this is another case of 'ship-it makes bad suggestions for partial updates' (bug 1146863)

  • to fix this up nthomas dropped a dummy text file at pub/firefox/releases/44.0.1/update/win64/zh-TW/firefox-41.0.2-44.0.1.partial.mar, but buildbot failed to continue once it had uptake for all requests:
2016-02-08 14:30:43-0800 [HTTPPageGetter,client] TriggerBouncerCheck: uptake is 2000100
2016-02-08 14:30:43-0800 [HTTPPageGetter,client] TriggerBouncerCheck: Stopping uptake monitoring: Reached required uptake: 2000100
2016-02-08 14:30:43-0800 [HTTPPageGetter,client] TriggerBouncerCheck failed:
        Traceback (most recent call last):
          File "/builds/buildbot/build1/lib/python2.7/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks
            self.result = callback(self.result, *args, **kw)
          File "/builds/buildbot/build1/lib/python2.7/site-packages/twisted/internet/defer.py", line 664, in _cbDeferred
            self.callback(self.resultList)
          File "/builds/buildbot/build1/lib/python2.7/site-packages/twisted/internet/defer.py", line 318, in callback
            self._startRunCallbacks(result)
          File "/builds/buildbot/build1/lib/python2.7/site-packages/twisted/internet/defer.py", line 424, in _startRunCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File "/builds/buildbot/build1/lib/python2.7/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks
            self.result = callback(self.result, *args, **kw)
          File "/builds/buildbot/build1/lib/python2.7/site-packages/buildbotcustom/scheduler.py", line 327, in checkUptake
            Triggerable.trigger(self, self.ss, self.set_props)
          File "/builds/buildbot/build1/lib/python2.7/site-packages/buildbot-0.8.2_hg_41f0fbee10f4_production_0.8-py2.7.egg/buildbot/schedulers/triggerable.py", line 66, in trigger
            d = self.parent.db.runInteraction(self._trigger, ss, props)
        exceptions.AttributeError: 'NoneType' object has no attribute 'db'
  • there was a reconfig at 11:00 PST (ie after the initial start of TriggerBouncerCheck and it succeeding) which is a known to cause failures.
  • to work around this the release-mozilla-release-firefox_release_start_uptake_monitoring builder was forced (setting script_repo_revision and release_config properties), which fired off jobs and emails as expected, culminating in the 'ready to release' email.

Firefox/Fennec_45.0b3 (jlund/rail/callek/nick/mtabara)

  • yet-another victory!
  • GTK3 status update on this release - excerpt from bug 1245476:
As discussed in 1227024, bug1205199 is critical enough to disable gtk 3 in 45:
* 45 is an ESR release, it would be safer to introduce gtk 3 in 46
* We only had two beta with gtk3, this is not enough for such important changes
* We need time to make sure gtk2 is still in a good shape
  • Some intermittent errors:
    • tl;dr retriggered - build step failed on win32: known intermittent - bug 1224886 - Intermittent Win PGO build LINK : fatal error LNK1000: Internal error during IMAGE::BuildImage after workerprivate.cpp(3075) : fatal error C1001: An internal error has occurred in the compiler
    • from the IRC channel
02:37:28 <nthomas> this sort of thing can lead to compiler upgrades to pick up fixes
02:38:09 <nthomas> acksully, https://bugzilla.mozilla.org/show_bug.cgi?id=1224886
02:39:05 <nthomas> they do fix these kind of bugs, eg https://connect.microsoft.com/VisualStudio/feedback/details/819439/fatal-error-c1001-an-internal-error-has-occurred-in-the-compiler
    • failed at repack_7/10 on linux64 - random network issue while setting up for the job, rerun it
    • failed at firefox_antivirus - network downloading issue with Marathi language locale for one of the win32 partials MARs, retriggered.
    • regular GTK3 errors for linux/linux64


Ongoing

Firefox 44.0.2 (mtabara/rail/nthomas)

  • new security bug issue bug 1245724. 44.0.2 is underway for both desktop and mobile
  • build1:
    • stopped in order to add one more critical fennec issue and start a build 2
  • build2:
    • intermittent errors for Firefox:
      • failed at firefox_antivirus, retriggered - intermittent download error for locale/partial

Fennec 44.0.2 (mtabara/rail/nthomas)

  • new security bug issue bug 1245724. 44.0.2 is underway for both desktop and mobile
  • build1:
    • stopped in order to add one more critical fennec issue and start a build 2
  • build2:
    • abandoned here for yet-another build to follow with a hotfix
    • intermittent errors:
      • Fennec 44.0.2 build2: build step failed on android-api-11 - failure to clone build/tools when the fingerprint didn't match. gps suspects AWS are rolling out new certs
  • build3:
    • abandonded here as buildbot-master73 froze our builds and was really slow today - given that there was too much room for human error to interfere, we'll follow-up with a fourth build.
  • build4:
    • intermittent errors:
      • [release-runner] WARNING: Reconfig exceeded 900m then 1800 seconds - looks like buildbot-master73 is naughty today and really slow hence it delayed the whole reconfig step
      • at least three builders have been grabbed by the same bm73 yet-again. We might end up in the same scenario as build3.

Thunderbird_45.0b1 (jlund/rail/callek/nick/mtabara)

  • TODO: awaiting decision as lack of TB equivalent watershed Firefox beta gtk3 rule in Balrog, please see email on TB-drivers email
  • build1
    • we had two win32 repacks failing
      • failed at repack_6/10 on win32 - retriggered, intermittent timeout
      • failed at repack_2/10 on win32 - retriggered
        • retriggered upon 'da' locale failed while submitting to balrog, specifically around the make_incremental_update.sh script
        • retriggered upon loosing slave instance
        • retriggered upon timeout
    • from tb-drivers mailing list: "We'll likely abandon build1 and go for build2 after getting some fixes"

  • build2:
    • "Same changesets as before, but buildbot changes merged to production."
    • gave up build 2 because of build error
  • build3:
    • intermittent errors:
      • failed at repack_7/10 on win32, automatic retry
      • failed at repack_3/10 on win32, automatic retry
      • failed at update_verify_beta_2/6 on linux64 - GTK3 known issue error
      • failed at update_verify_beta_2/6 on linux - GTK3 known issue error

Roundtable

Action items