Releases:Release Post Mortem:2016-02-17

From MozillaWiki
Jump to: navigation, search

Meeting Details

Release Duty

  • FF 45 cycle: mtabara



Firefox 45.0b6 (nick/mtabara/rail)

  • shipped during live channel mtg :)
  • all update verify are good, AV came in, awaiting only the QE sign-off to push it live
  • "Starting the build 6 without the last Hello changes. They broke m-b"
  • intermittent errors:
    • regular linux/linux64 GTK3 known-issue thingie for 3-4 update verify steps
    • bouncer_submitter failure, which is server side and apparently not new but we retry around it, nthomas filed bug 1248490. Retried the job, it succeeded with several auto-retries in the log.

Thunderbird 38.6.0 (nthomas/rail/mtabara)

Firefox 45.0b5 (mtabara/nick/rail)

  • victory
    • initially we had a bunch of the update verify steps failing; because of an infra bug we ended up with high load on the servers, balrog being somewhat busted for a short window timeframe - see bug 1247869 for more details.
    • failed at update_verify_beta_4/6 on macosx64, failed mercurial cloning - automatic retry
    • failed at update_verify_beta_6/6 on win64 - intermitent error while downloading complete mar
    • several others with GTK3 known issue errors
    • several others with downloading issues on complete mars or Balrog.
    • antivirus failed in several attempts due to IncompleteRead - nthomas filed bug 1248299 to track this

Fennec 44.0.2 (mtabara/rail/nthomas)

  • we deeeed it!
  • new security bug issue bug 1245724. 44.0.2 is underway for both desktop and mobile
  • build1:
    • stopped in order to add one more critical fennec issue and start a build 2
  • build2:
    • abandoned here for yet-another build to follow with a hotfix
    • intermittent errors:
      • Fennec 44.0.2 build2: build step failed on android-api-11 - failure to clone build/tools when the fingerprint didn't match. gps suspects AWS are rolling out new certs
  • build3:
    • abandonded here as buildbot-master73 froze our builds and was really slow today - given that there was too much room for human error to interfere, we'll follow-up with a fourth build.
  • build4:
    • intermittent errors:
      • [release-runner] WARNING: Reconfig exceeded 900m then 1800 seconds - looks like buildbot-master73 is naughty today and really slow hence it delayed the whole reconfig step
      • at least three builders have been grabbed by the same bm73 yet-again. We might end up in the same scenario as build3.

Firefox 38.6.1esr (mtabara/rail/nthomas)

  • victory eventually!
  • for the font related issue mentioned in another thread, bug 1246093, we are building and testing a dot release for ESR, 38.6.1.
  • building from a relbranch with just the one sec fix
  • build1:
    • some intermittent errors:
      • antivirus check failed for a downloading issue when scanning, retriggered
    • main issues:
      • we rushed into pushing it to esr-release channel without the QE signoff. Update tests were failing because of WNP error. mtabara changed the throttling to 0 in the first place, rail solved the WNP and then rates were amended to 100% yet again.

Firefox 44.0.2 (mtabara/rail/nthomas)

  • build2:
    • intermittent errors for Firefox:
      • failed at firefox_antivirus, retriggered - intermittent download error for locale/partial
    • abandoned as there's a follow-up build3 coming underway
  • build3:
    • this build is needed to address a critical windows startup issue (backed out bug 1218473)
    • intermittent errors for Firefox:
      • few update verify failed for downloading issues

Thunderbird_45.0b1 (jlund/rail/callek/nick/mtabara)

  • victory!
  • instead of disabling updates I pointed Linux users to 44.0b1 and others to 45.0b1
  • TODO: awaiting decision as lack of TB equivalent watershed Firefox beta gtk3 rule in Balrog, please see email on TB-drivers email
  • build1
    • we had two win32 repacks failing
      • failed at repack_6/10 on win32 - retriggered, intermittent timeout
      • failed at repack_2/10 on win32 - retriggered
        • retriggered upon 'da' locale failed while submitting to balrog, specifically around the script
        • retriggered upon loosing slave instance
        • retriggered upon timeout
    • from tb-drivers mailing list: "We'll likely abandon build1 and go for build2 after getting some fixes"

  • build2:
    • "Same changesets as before, but buildbot changes merged to production."
    • gave up build 2 because of build error
  • build3:
    • intermittent errors:
      • failed at repack_7/10 on win32, automatic retry
      • failed at repack_3/10 on win32, automatic retry
      • failed at update_verify_beta_2/6 on linux64 - GTK3 known issue error
      • failed at update_verify_beta_2/6 on linux - GTK3 known issue error


Fennec 45.0b6 (nick/mtabara/rail)

  • awaiting the Google Play email to run the post-release and move this to the Shipped section
  • "Starting the build 6 without the last Hello changes. They broke m-b"


  • bhearsum: should we really be blocking shipping a chemspill release on what's new page configuration? I don't have opinion on this particular what's new page, but holding back in-the-wild fixes because of a what's new page seems bad


20:12:19 <bhearsum> a question for the postmortem, maybe: should we really be blocking shipping a chemspill release on what's new page configuration?
20:12:55 <rail> we should stop showing it 
20:12:59 — ~mtabara agrees
20:13:19 <rail> we are about to ship esr45 :)
20:13:27 <bhearsum> i don't have opinion on this particular what's new page
20:13:55 <bhearsum> but holding back in-the-wild fixes because of a what's new page seems bad
20:14:59 <lizzard> we’r only holding it back for a short time
20:15:03 <lizzard> but good question....
20:15:32 <bhearsum> yeah, and it's only for esr in this case
20:15:41 <bhearsum> i doubt it made a practical difference for that userbase
20:15:51 <lizzard> For esr, i can’t imagine enterprise folks can deploy this so quickly as to mind an hour’s diference
20:15:52 <bhearsum> but if were the firefox release channel it might be a different story
20:17:26 <bhearsum> i guess it's also an important point that screwing up the WNP has more effect on the release channel
  • mtabara: while deploying TB 38.6.0 with nthomas we had to change the balrog update rates to 50%. While attempting we realized they were already change and there has been some strictly UI issue in Balrog as rate changes did not shown on rule history - bug 1248475. nthomas did a db query to find the answers we were then looking for:
mysql> select change_id, changed_by, from_unixtime(substr(timestamp, 1, 10)) as timestamp, backgroundRate from rules_history where rule_id=170 order by change_id desc limit 10;
| change_id | changed_by          | timestamp                  | backgroundRate |
|      4423 | tbirdbld            | 2016-02-15 20:41:30.000000 |             50 |
|      3890 | tbirdbld            | 2016-01-07 22:21:11.000000 |             50 |
|      3889 |   | 2016-01-07 22:20:21.000000 |             50 |
|      3810 |   | 2015-12-30 16:03:00.000000 |            100 |
|      3740 | | 2015-12-23 19:01:11.000000 |             30 |
|      3734 | tbirdbld            | 2015-12-23 15:18:47.000000 |             30 |
|      3733 | | 2015-12-23 15:16:31.000000 |             30 |
|      3425 |   | 2015-12-02 18:54:06.000000 |            100 |
|      3378 |   | 2015-11-27 17:07:23.000000 |              0 |
|      3334 | tbirdbld            | 2015-11-25 18:58:35.000000 |             30 |

Question for bhearsum: any chance I can get *read-only* access on that DB as well for future scenarios?

  • mtabara: bug 1241263 on FAQ, feel free to add/amend input should you like
  • mtabara: improvement proposal partner_repack related

Action items